Model Profiler API Examples¶
This demonstrates how to use the profile_model API.
Refer to the Model Profiler guide for more details.
NOTES:
Click here: to run this example interactively in your browser
Refer to the Notebook Examples Guide for how to run this example locally in VSCode
Install MLTK Python Package¶
# Install the MLTK Python package (if necessary)
!pip install --upgrade silabs-mltk
Import Python Packages¶
# Import the standard Python packages used by the examples
import os
import urllib
import shutil
import tempfile
# Import the necessary MLTK APIs
from mltk.core import profile_model
from mltk.utils.commander import query_platform
Download .tflite model file¶
A .tflite
model file is required to run these examples.
The following code downloads a model.
NOTE: Update TFLITE_MODEL_URL
or tflite_path
to point to your model if necesary
# Use .tflite mode found here:
# https://github.com/siliconlabs/mltk/tree/master/mltk/utils/test_helper/data/
# NOTE: Update this URL to point to your model if necessary
TFLITE_MODEL_URL = 'https://github.com/siliconlabs/mltk/raw/master/mltk/utils/test_helper/data/image_example1.tflite'
# Download the .tflite file and save to the temp dir
tflite_path = os.path.normpath(f'{tempfile.gettempdir()}/image_example1.tflite')
with open(tflite_path, 'wb') as dst:
with urllib.request.urlopen(TFLITE_MODEL_URL) as src:
shutil.copyfileobj(src, dst)
Example 1: Profile .tflite file in basic simulator¶
This example profiles the .tflite
model file in the “basic simulator” of the model profiler.
# Profile the tflite model using the "basic simulator"
# NOTE: Update tflite_path to point to your model if necessary
profiling_results = profile_model(tflite_path, return_estimates=True)
# Print the profiling results
print(profiling_results)
Profiling Summary
Name: image_example1
Accelerator: None
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 71.5k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
CPU Cycle Count: 13.1M
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 2.3m
J/Op: 884.5p
J/MAC: 2.0n
Model Layers
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode | # Ops | # MACs | CPU Cycles | Energy (J) | Input Shape | Output Shape | Options |
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| 0 | conv_2d | 1.2M | 497.7k | 10.0M | 1.9m | 1x96x96x1,24x3x3x1,24 | 1x48x48x24 | Padding:same stride:2x2 activation:relu |
| 1 | average_pool_2d | 69.1k | 0 | 985.7k | 148.0u | 1x48x48x24 | 1x24x24x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 2 | conv_2d | 842.2k | 418.2k | 1.3M | 187.5u | 1x24x24x24,16x3x3x24,16 | 1x11x11x16 | Padding:valid stride:2x2 activation:relu |
| 3 | conv_2d | 565.7k | 279.9k | 718.6k | 105.7u | 1x11x11x16,24x3x3x16,24 | 1x9x9x24 | Padding:valid stride:1x1 activation:relu |
| 4 | average_pool_2d | 1.9k | 0 | 30.8k | 9.3u | 1x9x9x24 | 1x4x4x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 5 | reshape | 0 | 0 | 250.4 | 0.0p | 1x4x4x24,2 | 1x384 | Type=none |
| 6 | fully_connected | 2.3k | 1.2k | 5.2k | 21.5n | 1x384,3x384,3 | 1x3 | Activation:none |
| 7 | softmax | 15.0 | 0 | 3.8k | 16.5n | 1x3 | 1x3 | Type=softmaxoptions |
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
Example 2: Profile .tflite file in MVP hardware simulator¶
This example profiles the .tflite
model file in the MVP hardware accelerator simulator of the model profiler.
# Profile the tflite model using the MVP hardware accelerator simulator
# NOTE: Update tflite_path to point to your model if necessary
profiling_results = profile_model(tflite_path, accelerator='mvp', return_estimates=True)
# Print the profiling results
print(profiling_results)
Profiling Summary
Name: image_example1
Accelerator: MVP
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 85.3k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
Accelerator Cycle Count: 1.1M
CPU Cycle Count: 81.3k
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 153.0u
J/Op: 57.9p
J/MAC: 127.8p
Model Layers
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode | # Ops | # MACs | Acc Cycles | CPU Cycles | Energy (J) | Input Shape | Output Shape | Options |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| 0 | conv_2d | 1.2M | 497.7k | 719.0k | 11.2k | 52.4u | 1x96x96x1,24x3x3x1,24 | 1x48x48x24 | Padding:same stride:2x2 activation:relu |
| 1 | average_pool_2d | 69.1k | 0 | 48.4k | 22.7k | 5.4u | 1x48x48x24 | 1x24x24x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 2 | conv_2d | 842.2k | 418.2k | 223.8k | 5.9k | 45.7u | 1x24x24x24,16x3x3x24,16 | 1x11x11x16 | Padding:valid stride:2x2 activation:relu |
| 3 | conv_2d | 565.7k | 279.9k | 148.8k | 8.0k | 45.7u | 1x11x11x16,24x3x3x16,24 | 1x9x9x24 | Padding:valid stride:1x1 activation:relu |
| 4 | average_pool_2d | 1.9k | 0 | 1.3k | 27.8k | 3.7u | 1x9x9x24 | 1x4x4x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 5 | reshape | 0 | 0 | 0 | 250.4 | 0.0p | 1x4x4x24,2 | 1x384 | Type=none |
| 6 | fully_connected | 2.3k | 1.2k | 1.7k | 1.5k | 49.2n | 1x384,3x384,3 | 1x3 | Activation:none |
| 7 | softmax | 15.0 | 0 | 0 | 3.8k | 16.5n | 1x3 | 1x3 | Type=softmaxoptions |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
Example 3: Profile .tflite file on physical device¶
This example profiles the .tflite
model file on a physically connected embedded device.
NOTE: A supported development board must be connected and properly enumerated for this example to work.
# Determine the currently connected device
# Just print an error and return if no device is connected
try:
platform_name = query_platform()
except Exception as e:
print(f'Failed to determine connected device, err:\n{e}')
sys.exit(0)
print(f'Conencted device platform: {platform_name}')
accelerator = None
if platform_name in ('brd2601a', 'brd4186b'):
# Use the MVP hardware accelerator if the platform supports it
accelerator = 'mvp'
# Profile the tflite model on the physical device
profiling_results = profile_model(
tflite_path,
accelerator=accelerator,
use_device=True
)
# Print the profiling results
print(profiling_results)
Conencted device platform: brd2601
Profiling Summary
Name: image_example1
Accelerator: None
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 71.4k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
CPU Cycle Count: 9.5M
CPU Utilization (%): 100.0
Clock Rate (hz): 78.0M
Time (s): 119.7m
Ops/s: 22.1M
MACs/s: 10.0M
Inference/s: 8.4
Model Layers
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode | # Ops | # MACs | CPU Cycles | Time (s) | Input Shape | Output Shape | Options |
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+
| 0 | conv_2d | 1.2M | 497.7k | 6.3M | 80.0m | 1x96x96x1,24x3x3x1,24 | 1x48x48x24 | Padding:same stride:2x2 activation:relu |
| 1 | average_pool_2d | 69.1k | 0 | 759.4k | 9.6m | 1x48x48x24 | 1x24x24x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 2 | conv_2d | 842.2k | 418.2k | 1.4M | 17.6m | 1x24x24x24,16x3x3x24,16 | 1x11x11x16 | Padding:valid stride:2x2 activation:relu |
| 3 | conv_2d | 565.7k | 279.9k | 956.1k | 12.1m | 1x11x11x16,24x3x3x16,24 | 1x9x9x24 | Padding:valid stride:1x1 activation:relu |
| 4 | average_pool_2d | 1.9k | 0 | 21.9k | 270.0u | 1x9x9x24 | 1x4x4x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 5 | reshape | 0 | 0 | 2.3k | 30.0u | 1x4x4x24,2 | 1x384 | Type=none |
| 6 | fully_connected | 2.3k | 1.1k | 5.1k | 60.0u | 1x384,3x384,3 | 1x3 | Activation:none |
| 7 | softmax | 15.0 | 0 | 2.9k | 60.0u | 1x3 | 1x3 | Type=softmaxoptions |
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+
Example 4: Profile model before training¶
Training a model can be very time-consuming, and it is useful to know how efficiently a
model will execute on an embedded device before investing time and energy into training it.
For this reason, the MLTK profile_model API features a build
argument to build a model and profile it before the model is fully trained.
In this example, the image_example1 model is built
at command-execution-time and profiled in the MVP hardware simulator.
Note that only the model specification script is required,
it does not need to be trained first.
# Build the image_example1 model then profile it using the MVP hardware accelerator simulator
# NOTE: Since build=True, the model does NOT need to be trained first
profiling_results = profile_model('image_example1', accelerator='mvp', build=True, return_estimates=True)
# Print the profiling results
print(profiling_results)
Epoch 1/3
Epoch 2/3
Epoch 3/3
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 3 of 3). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: E:\tmpc8yu6n46\assets
INFO:tensorflow:Assets written to: E:\tmpc8yu6n46\assets
c:\Users\reed\workspace\silabs\mltk\.venv\lib\site-packages\tensorflow\lite\python\convert.py:766: UserWarning: Statistics for quantized inputs were expected, but not specified; continuing anyway.
warnings.warn("Statistics for quantized inputs were expected, but not "
Profiling Summary
Name: my_model
Accelerator: MVP
Input Shape: 1x96x96x1
Input Data Type: float32
Output Shape: 1x3
Output Data Type: float32
Flash, Model File Size (bytes): 15.4k
RAM, Runtime Memory Size (bytes): 85.4k
Operation Count: 2.7M
Multiply-Accumulate Count: 1.2M
Layer Count: 10
Unsupported Layer Count: 0
Accelerator Cycle Count: 1.1M
CPU Cycle Count: 415.3k
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 219.6u
J/Op: 82.0p
J/MAC: 183.5p
Model Layers
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode | # Ops | # MACs | Acc Cycles | CPU Cycles | Energy (J) | Input Shape | Output Shape | Options |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| 0 | quantize | 36.9k | 0 | 0 | 332.6k | 66.4u | 1x96x96x1 | 1x96x96x1 | Type=none |
| 1 | conv_2d | 1.2M | 497.7k | 719.0k | 11.2k | 52.4u | 1x96x96x1,24x3x3x1,24 | 1x48x48x24 | Padding:same stride:2x2 activation:relu |
| 2 | average_pool_2d | 69.1k | 0 | 48.4k | 22.7k | 5.4u | 1x48x48x24 | 1x24x24x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 3 | conv_2d | 842.2k | 418.2k | 223.8k | 5.9k | 45.7u | 1x24x24x24,16x3x3x24,16 | 1x11x11x16 | Padding:valid stride:2x2 activation:relu |
| 4 | conv_2d | 565.7k | 279.9k | 148.8k | 8.0k | 45.7u | 1x11x11x16,24x3x3x16,24 | 1x9x9x24 | Padding:valid stride:1x1 activation:relu |
| 5 | average_pool_2d | 1.9k | 0 | 1.3k | 27.8k | 3.7u | 1x9x9x24 | 1x4x4x24 | Padding:valid stride:2x2 filter:2x2 activation:none |
| 6 | reshape | 0 | 0 | 0 | 250.4 | 0.0p | 1x4x4x24,2 | 1x384 | Type=none |
| 7 | fully_connected | 2.3k | 1.2k | 1.7k | 1.5k | 49.2n | 1x384,3x384,3 | 1x3 | Activation:none |
| 8 | softmax | 15.0 | 0 | 0 | 3.8k | 16.5n | 1x3 | 1x3 | Type=softmaxoptions |
| 9 | dequantize | 6.0 | 0 | 0 | 1.4k | 159.1n | 1x3 | 1x3 | Type=none |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+