Model Profiler API Examples

This demonstrates how to use the profile_model API.

Refer to the Model Profiler guide for more details.

NOTES:

  • Click here: Open In Colab to run this example interactively in your browser

  • Refer to the Notebook Examples Guide for how to run this example locally in VSCode

Install MLTK Python Package

# Install the MLTK Python package (if necessary)
!pip install --upgrade silabs-mltk

Import Python Packages

# Import the standard Python packages used by the examples
import os
import urllib
import shutil
import tempfile

# Import the necessary MLTK APIs
from mltk.core import profile_model
from mltk.utils.commander import query_platform

Download .tflite model file

A .tflite model file is required to run these examples.
The following code downloads a model.

NOTE: Update TFLITE_MODEL_URL or tflite_path to point to your model if necesary

# Use .tflite mode found here:
# https://github.com/siliconlabs/mltk/tree/master/mltk/utils/test_helper/data/
# NOTE: Update this URL to point to your model if necessary
TFLITE_MODEL_URL = 'https://github.com/siliconlabs/mltk/raw/master/mltk/utils/test_helper/data/image_example1.tflite'

# Download the .tflite file and save to the temp dir
tflite_path = os.path.normpath(f'{tempfile.gettempdir()}/image_example1.tflite')
with open(tflite_path, 'wb') as dst:
    with urllib.request.urlopen(TFLITE_MODEL_URL) as src:
        shutil.copyfileobj(src, dst)

Example 1: Profile .tflite file in basic simulator

This example profiles the .tflite model file in the “basic simulator” of the model profiler.

# Profile the tflite model using the "basic simulator"
# NOTE: Update tflite_path to point to your model if necessary
profiling_results = profile_model(tflite_path)

# Print the profiling results
print(profiling_results)
Profiling Summary
Name: image_example1
Accelerator: None
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 71.5k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
CPU Cycle Count: 13.1M
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 2.3m
J/Op: 884.5p
J/MAC: 2.0n

Model Layers
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | CPU Cycles | Energy (J) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| 0     | conv_2d         | 1.2M   | 497.7k | 10.0M      | 1.9m       | 1x96x96x1,24x3x3x1,24   | 1x48x48x24   | Padding:same stride:2x2 activation:relu             |
| 1     | average_pool_2d | 69.1k  | 0      | 985.7k     | 148.0u     | 1x48x48x24              | 1x24x24x24   | Padding:valid stride:2x2 filter:2x2 activation:none |
| 2     | conv_2d         | 842.2k | 418.2k | 1.3M       | 187.5u     | 1x24x24x24,16x3x3x24,16 | 1x11x11x16   | Padding:valid stride:2x2 activation:relu            |
| 3     | conv_2d         | 565.7k | 279.9k | 718.6k     | 105.7u     | 1x11x11x16,24x3x3x16,24 | 1x9x9x24     | Padding:valid stride:1x1 activation:relu            |
| 4     | average_pool_2d | 1.9k   | 0      | 30.8k      | 9.3u       | 1x9x9x24                | 1x4x4x24     | Padding:valid stride:2x2 filter:2x2 activation:none |
| 5     | reshape         | 0      | 0      | 250.4      | 0.0p       | 1x4x4x24,2              | 1x384        | Type=none                                           |
| 6     | fully_connected | 2.3k   | 1.2k   | 5.2k       | 21.5n      | 1x384,3x384,3           | 1x3          | Activation:none                                     |
| 7     | softmax         | 15.0   | 0      | 3.8k       | 16.5n      | 1x3                     | 1x3          | Type=softmaxoptions                                 |
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+

Example 2: Profile .tflite file in MVP hardware simulator

This example profiles the .tflite model file in the MVP hardware accelerator simulator of the model profiler.

# Profile the tflite model using the MVP hardware accelerator simulator
# NOTE: Update tflite_path to point to your model if necessary
profiling_results = profile_model(tflite_path, accelerator='MVP')

# Print the profiling results
print(profiling_results)
Profiling Summary
Name: image_example1
Accelerator: MVP
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 85.3k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
Accelerator Cycle Count: 1.1M
CPU Cycle Count: 81.3k
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 153.0u
J/Op: 57.9p
J/MAC: 127.8p

Model Layers
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | Acc Cycles | CPU Cycles | Energy (J) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| 0     | conv_2d         | 1.2M   | 497.7k | 719.0k     | 11.2k      | 52.4u      | 1x96x96x1,24x3x3x1,24   | 1x48x48x24   | Padding:same stride:2x2 activation:relu             |
| 1     | average_pool_2d | 69.1k  | 0      | 48.4k      | 22.7k      | 5.4u       | 1x48x48x24              | 1x24x24x24   | Padding:valid stride:2x2 filter:2x2 activation:none |
| 2     | conv_2d         | 842.2k | 418.2k | 223.8k     | 5.9k       | 45.7u      | 1x24x24x24,16x3x3x24,16 | 1x11x11x16   | Padding:valid stride:2x2 activation:relu            |
| 3     | conv_2d         | 565.7k | 279.9k | 148.8k     | 8.0k       | 45.7u      | 1x11x11x16,24x3x3x16,24 | 1x9x9x24     | Padding:valid stride:1x1 activation:relu            |
| 4     | average_pool_2d | 1.9k   | 0      | 1.3k       | 27.8k      | 3.7u       | 1x9x9x24                | 1x4x4x24     | Padding:valid stride:2x2 filter:2x2 activation:none |
| 5     | reshape         | 0      | 0      | 0          | 250.4      | 0.0p       | 1x4x4x24,2              | 1x384        | Type=none                                           |
| 6     | fully_connected | 2.3k   | 1.2k   | 1.7k       | 1.5k       | 49.2n      | 1x384,3x384,3           | 1x3          | Activation:none                                     |
| 7     | softmax         | 15.0   | 0      | 0          | 3.8k       | 16.5n      | 1x3                     | 1x3          | Type=softmaxoptions                                 |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+

Example 3: Profile .tflite file on physical device

This example profiles the .tflite model file on a physically connected embedded device.

NOTE: A supported development board must be connected and properly enumerated for this example to work.

# Determine the currently connected device
# Just print an error and return if no device is connected
try:
    platform_name = query_platform()
except Exception as e:
    print(f'Failed to determine connected device, err:\n{e}')
    sys.exit(0)

print(f'Conencted device platform: {platform_name}')

accelerator = None
if platform_name in ('brd2601a', 'brd4186b'):
    # Use the MVP hardware accelerator if the platform supports it
    accelerator = 'MVP'

# Profile the tflite model on the physical device
profiling_results = profile_model(
    tflite_path,
    accelerator=accelerator,
    use_device=True
)

# Print the profiling results
print(profiling_results)
Conencted device platform: brd2601
Profiling Summary
Name: image_example1
Accelerator: None
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 71.4k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
CPU Cycle Count: 9.5M
CPU Utilization (%): 100.0
Clock Rate (hz): 78.0M
Time (s): 119.7m
Ops/s: 22.1M
MACs/s: 10.0M
Inference/s: 8.4

Model Layers
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | CPU Cycles | Time (s) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+
| 0     | conv_2d         | 1.2M   | 497.7k | 6.3M       | 80.0m    | 1x96x96x1,24x3x3x1,24   | 1x48x48x24   | Padding:same stride:2x2 activation:relu             |
| 1     | average_pool_2d | 69.1k  | 0      | 759.4k     | 9.6m     | 1x48x48x24              | 1x24x24x24   | Padding:valid stride:2x2 filter:2x2 activation:none |
| 2     | conv_2d         | 842.2k | 418.2k | 1.4M       | 17.6m    | 1x24x24x24,16x3x3x24,16 | 1x11x11x16   | Padding:valid stride:2x2 activation:relu            |
| 3     | conv_2d         | 565.7k | 279.9k | 956.1k     | 12.1m    | 1x11x11x16,24x3x3x16,24 | 1x9x9x24     | Padding:valid stride:1x1 activation:relu            |
| 4     | average_pool_2d | 1.9k   | 0      | 21.9k      | 270.0u   | 1x9x9x24                | 1x4x4x24     | Padding:valid stride:2x2 filter:2x2 activation:none |
| 5     | reshape         | 0      | 0      | 2.3k       | 30.0u    | 1x4x4x24,2              | 1x384        | Type=none                                           |
| 6     | fully_connected | 2.3k   | 1.1k   | 5.1k       | 60.0u    | 1x384,3x384,3           | 1x3          | Activation:none                                     |
| 7     | softmax         | 15.0   | 0      | 2.9k       | 60.0u    | 1x3                     | 1x3          | Type=softmaxoptions                                 |
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+

Example 4: Profile model before training

Training a model can be very time-consuming, and it is useful to know how efficiently a model will execute on an embedded device before investing time and energy into training it.
For this reason, the MLTK profile_model API features a build argument to build a model and profile it before the model is fully trained.

In this example, the image_example1 model is built at command-execution-time and profiled in the MVP hardware simulator.
Note that only the model specification script is required, it does not need to be trained first.

# Build the image_example1 model then profile it using the MVP hardware accelerator simulator
# NOTE: Since build=True, the model does NOT need to be trained first
profiling_results = profile_model('image_example1', accelerator='MVP', build=True)

# Print the profiling results
print(profiling_results)
Epoch 1/3
Epoch 2/3
Epoch 3/3
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 3 of 3). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: E:\tmpc8yu6n46\assets
INFO:tensorflow:Assets written to: E:\tmpc8yu6n46\assets
c:\Users\reed\workspace\silabs\mltk\.venv\lib\site-packages\tensorflow\lite\python\convert.py:766: UserWarning: Statistics for quantized inputs were expected, but not specified; continuing anyway.
  warnings.warn("Statistics for quantized inputs were expected, but not "
Profiling Summary
Name: my_model
Accelerator: MVP
Input Shape: 1x96x96x1
Input Data Type: float32
Output Shape: 1x3
Output Data Type: float32
Flash, Model File Size (bytes): 15.4k
RAM, Runtime Memory Size (bytes): 85.4k
Operation Count: 2.7M
Multiply-Accumulate Count: 1.2M
Layer Count: 10
Unsupported Layer Count: 0
Accelerator Cycle Count: 1.1M
CPU Cycle Count: 415.3k
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 219.6u
J/Op: 82.0p
J/MAC: 183.5p

Model Layers
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | Acc Cycles | CPU Cycles | Energy (J) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| 0     | quantize        | 36.9k  | 0      | 0          | 332.6k     | 66.4u      | 1x96x96x1               | 1x96x96x1    | Type=none                                           |
| 1     | conv_2d         | 1.2M   | 497.7k | 719.0k     | 11.2k      | 52.4u      | 1x96x96x1,24x3x3x1,24   | 1x48x48x24   | Padding:same stride:2x2 activation:relu             |
| 2     | average_pool_2d | 69.1k  | 0      | 48.4k      | 22.7k      | 5.4u       | 1x48x48x24              | 1x24x24x24   | Padding:valid stride:2x2 filter:2x2 activation:none |
| 3     | conv_2d         | 842.2k | 418.2k | 223.8k     | 5.9k       | 45.7u      | 1x24x24x24,16x3x3x24,16 | 1x11x11x16   | Padding:valid stride:2x2 activation:relu            |
| 4     | conv_2d         | 565.7k | 279.9k | 148.8k     | 8.0k       | 45.7u      | 1x11x11x16,24x3x3x16,24 | 1x9x9x24     | Padding:valid stride:1x1 activation:relu            |
| 5     | average_pool_2d | 1.9k   | 0      | 1.3k       | 27.8k      | 3.7u       | 1x9x9x24                | 1x4x4x24     | Padding:valid stride:2x2 filter:2x2 activation:none |
| 6     | reshape         | 0      | 0      | 0          | 250.4      | 0.0p       | 1x4x4x24,2              | 1x384        | Type=none                                           |
| 7     | fully_connected | 2.3k   | 1.2k   | 1.7k       | 1.5k       | 49.2n      | 1x384,3x384,3           | 1x3          | Activation:none                                     |
| 8     | softmax         | 15.0   | 0      | 0          | 3.8k       | 16.5n      | 1x3                     | 1x3          | Type=softmaxoptions                                 |
| 9     | dequantize      | 6.0    | 0      | 0          | 1.4k       | 159.1n     | 1x3                     | 1x3          | Type=none                                           |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+