Model Profiler¶
The MLTK model profiler provides information about how efficiently a model may run on an embedded target.
The model profiler allows for executing a .tflite
model file in a simulator or on a physical embedded target.
This guide describes how to run the model profiler from the command-line or Python API.
Alternatively, refer to the Model Profiler Utility which allows
for running the model profiler as a standalone executable with a webpage interface.
Note
Any .tflite
model file supported by Tensorflow-Lite Micro
will work with the model profiler.
i.e. The .tflite
does not need to be generated by the MLTK to use the profiler.
Note
All model profiling is done locally. No data is uploaded to a remote server
Quick Reference¶
Command-line: mltk profile –help
Python API: profile_model
Python API examples: profile_model.ipynb
Overview¶
Profiling Metrics¶
The model profiler returns results for the entire model as well as individual layers of the model.
Entire Model Metrics¶
Name | Description |
---|---|
Name | Name of profiled model |
Accelerator | Name of hardware accelerator |
Input Shape | Shape of the model's input tensor |
Input Data Type | Model input's data type |
Output Shape | Shape of the model's output tensor |
Output Data Type | Model output's data type |
Model File Size | Size of the .tflite model file (this is effectively the flash required by the model) |
Runtime Memory Size | Size of RAM required for Tensorflow-Lite Micro's working memory |
# Operations | Number of mathematical operations required to execute the model |
# Multiply-Accumulates | Number of multiply-accumulate operations required to execute the model |
# Layers | Number of layers in model |
# Unsupported Layers | Number of layers that could not be accelerated due to hardware accelerator constraints |
# Accelerator Cycles | Number of clock cycles required by hardware accelerator |
# CPU Cycles | Number of CPU clock cycles |
CPU Utilization | Percentage of CPU used to execute model |
Clock Rate | CPU clock rate |
Time | Time required to execute model (i.e. latency) |
Energy | Energy required to execute model (relative to CPU idling) |
J/Op | Energy per operation |
J/MAC | Energy per multiply-accumulate |
Ops/s | Operations per second |
MACs/s | Multiply-accumulate per second |
Inference/s | Number of times the model can execute per second |
Per Layer Metrics¶
Name | Description |
---|---|
Index | Model layer index |
OpCode | Kernel Layer name |
# Ops | Number of mathematical operations required by layer |
# MACs | Number of multiply-accumulate operations required by layer |
Acc Cycles | Number of accelerator cycles required by layer |
CPU Cycles | Number of CPU cycles required by layer |
Energy | Energy required by layer (relative to CPU idling) |
Time | Time required to execute layer (i.e. latency) |
Input Shape | Shape(s) of layer input tensor(s) |
Output Shape | Shape(s) of layer output tensor(s) |
Options | Kernel configuration options used by layer |
Supported? | False if the layer was not able to be accelerated, True else |
Error Msg | Error message if layer was not able to be accelerated |
Modes of Operation¶
The model profiler has three modes of operation:
Basic Simulator Mode¶
The model executes the Tensorflow-Lite Micro ARM CMSIS kernels and reference kernels in a basic simulator.
All returned profiling information is estimated.
No physical device required
Estimates CPU cycles and latency
Estimates required energy per inference
NOTE: Estimates are provided based on the ARM Cortex-M33.
Hardware Simulator Mode¶
The model executes in hardware accelerator simulator.
All returned profiling information is calculated or estimated.
No physical device required
Accelerator cycles calculated in hardware simulator
Estimates CPU cycles and latency
Estimates required energy per inference
Note
Estimated numbers are based on the EFR32xG24 at 78MHz
Physical Device Mode¶
The model executes and is profiled on a physical device.
This allows for determining actual profiling numbers (i.e. not calculated or estimated).
Physical device must be locally connected
Accelerator cycles, CPU cycles, and latency measured on physical device
No energy measurements provided
Command¶
Model profiling from the command-line is done using profile
operation.
For more details on the available command-line options, issue the command:
mltk profile --help
The following are examples of how the profiler can be invoked from the command-line:
Example 1: Profile in basic simulator¶
Profile the given .tflite
model file in the basic simulator.
With this command, no physical device is required.
This command will also provide profiling results for:
Estimated latency (i.e. seconds per inference)
Estimated CPU cycles
Estimated energy
mltk profile ~/workspace/my_model.tflite --estimates
Example 2: Profile in MVP hardware simulator¶
Profile the given .tflite
model file in the MVP hardware simulator.
With this command, no physical device is required.
This command will also provide profiling results for:
Estimated latency (i.e. seconds per inference)
Calculated accelerator cycles
Estimated CPU cycles
Estimated energy
mltk profile ~/workspace/my_model.tflite --accelerator MVP --estimates
Example 3: Profile on physical device using MVP hardware accelerator¶
Profile the given .tflite
model file on a physically connected embedded device using the MVP hardware accelerator.
This command will also provide measured profiling results for:
Latency (i.e. seconds per inference)
Accelerator cycles
CPU cycles
mltk profile ~/workspace/my_model.tflite --accelerator MVP --device
Example 4: Profile model before training¶
Training a model can be very time-consuming, and it is useful to know how efficiently a
model will execute on an embedded device before investing time and energy into training it.
For this reason, the MLTK profile
command features a --build
flag to build a model
and profile it before the model is fully trained.
In this example, the image_example1 model is built at command-execution-time and profiled in the MVP hardware simulator. Note that only the model specification script is required, it does not need to be trained first.
mltk profile image_example1 --build --accelerator MVP --estimates
Python API¶
The model profiler is accessible via profile_model API.
Examples using this API may be found in profile_model.ipynb