Model Profiler¶

The MLTK model profiler provides information about how efficiently a model may run on an embedded target.
The model profiler allows for executing a .tflite model file in a simulator or on a physical embedded target.

This guide describes how to run the model profiler from the command-line or Python API.
Alternatively, refer to the Model Profiler Utility which allows for running the model profiler as a standalone executable with a webpage interface.

Note

Any .tflite model file supported by Tensorflow-Lite Micro will work with the model profiler.
i.e. The .tflite does not need to be generated by the MLTK to use the profiler.

Note

All model profiling is done locally. No data is uploaded to a remote server

Quick Reference¶

Command-line: mltk profile –help
Python API: profile_model
Python API examples: profile_model.ipynb

Overview¶

Profiling Metrics¶

The model profiler returns results for the entire model as well as individual layers of the model.

Entire Model Metrics¶

Name	Description
Name	Name of profiled model
Accelerator	Name of hardware accelerator
Input Shape	Shape of the model's input tensor
Input Data Type	Model input's data type
Output Shape	Shape of the model's output tensor
Output Data Type	Model output's data type
Model File Size	Size of the `.tflite` model file (this is effectively the flash required by the model)
Runtime Memory Size	Size of RAM required for Tensorflow-Lite Micro's working memory
# Operations	Number of mathematical operations required to execute the model
# Multiply-Accumulates	Number of multiply-accumulate operations required to execute the model
# Layers	Number of layers in model
# Unsupported Layers	Number of layers that could not be accelerated due to hardware accelerator constraints
# Accelerator Cycles	Number of clock cycles required by hardware accelerator
# CPU Cycles	Number of CPU clock cycles
CPU Utilization	Percentage of CPU used to execute model
Clock Rate	CPU clock rate
Time	Time required to execute model (i.e. latency)
Energy	Energy required to execute model (relative to CPU idling)
J/Op	Energy per operation
J/MAC	Energy per multiply-accumulate
Ops/s	Operations per second
MACs/s	Multiply-accumulate per second
Inference/s	Number of times the model can execute per second

Per Layer Metrics¶

Name	Description
Index	Model layer index
OpCode	Kernel Layer name
# Ops	Number of mathematical operations required by layer
# MACs	Number of multiply-accumulate operations required by layer
Acc Cycles	Number of accelerator cycles required by layer
CPU Cycles	Number of CPU cycles required by layer
Energy	Energy required by layer (relative to CPU idling)
Time	Time required to execute layer (i.e. latency)
Input Shape	Shape(s) of layer input tensor(s)
Output Shape	Shape(s) of layer output tensor(s)
Options	Kernel configuration options used by layer
Supported?	`False` if the layer was not able to be accelerated, `True` else
Error Msg	Error message if layer was not able to be accelerated

Modes of Operation¶

The model profiler has three modes of operation:

Basic Simulator Mode¶

The model executes the Tensorflow-Lite Micro ARM CMSIS kernels and reference kernels in a basic simulator.
All returned profiling information is estimated.

No physical device required
Estimates CPU cycles and latency
Estimates required energy per inference

NOTE: Estimates are provided based on the ARM Cortex-M33.

Hardware Simulator Mode¶

The model executes in hardware accelerator simulator.
All returned profiling information is calculated or estimated.

No physical device required
Accelerator cycles calculated in hardware simulator
Estimates CPU cycles and latency
Estimates required energy per inference

Note

Estimated numbers are based on the EFR32xG24 at 78MHz

Simulator Accuracy¶

The simulator provides coarse estimates of CPU cycles, latency, energy and are based on the EFR32xG24 at 78MHz. While the estimates may be considered a starting point for model analysis, the Physical Device Mode should be used for accurate profiling numbers.

Physical Device Mode¶

The model executes and is profiled on a physical device.
This allows for determining actual profiling numbers (i.e. not calculated or estimated).

Physical device must be locally connected
Accelerator cycles, CPU cycles, and latency measured on physical device
No energy measurements provided

Command¶

Model profiling from the command-line is done using profile operation.

For more details on the available command-line options, issue the command:

mltk profile --help

The following are examples of how the profiler can be invoked from the command-line:

Example 1: Profile in basic simulator¶

Profile the given .tflite model file in the basic simulator.
With this command, no physical device is required.
This command will also provide profiling results for:

Estimated latency (i.e. seconds per inference)
Estimated CPU cycles
Estimated energy

mltk profile ~/workspace/my_model.tflite --estimates

Example 2: Profile in MVP hardware simulator¶

Profile the given .tflite model file in the MVP hardware simulator.
With this command, no physical device is required.
This command will also provide profiling results for:

Estimated latency (i.e. seconds per inference)
Calculated accelerator cycles
Estimated CPU cycles
Estimated energy

mltk profile ~/workspace/my_model.tflite --accelerator MVP --estimates

Example 3: Profile on physical device using MVP hardware accelerator¶

Profile the given .tflite model file on a physically connected embedded device using the MVP hardware accelerator.
This command will also provide measured profiling results for:

Latency (i.e. seconds per inference)
Accelerator cycles
CPU cycles

mltk profile ~/workspace/my_model.tflite --accelerator MVP --device

Example 4: Profile model before training¶

Training a model can be very time-consuming, and it is useful to know how efficiently a model will execute on an embedded device before investing time and energy into training it. For this reason, the MLTK profile command features a --build flag to build a model and profile it before the model is fully trained.

In this example, the image_example1 model is built at command-execution-time and profiled in the MVP hardware simulator. Note that only the model specification script is required, it does not need to be trained first.

mltk profile image_example1 --build --accelerator MVP --estimates

Python API¶

The model profiler is accessible via profile_model API.

Examples using this API may be found in profile_model.ipynb