Model Profiler

The MLTK model profiler provides information about how efficiently a model may run on an embedded target.
The model profiler allows for executing a .tflite model file in a simulator or on a physical embedded target.

This guide describes how to run the model profiler from the command-line or Python API.
Alternatively, refer to the Model Profiler Utility which allows for running the model profiler as a standalone executable with a webpage interface.

Note

Any .tflite model file supported by Tensorflow-Lite Micro will work with the model profiler.
i.e. The .tflite does not need to be generated by the MLTK to use the profiler.

Note

All model profiling is done locally. No data is uploaded to a remote server

Quick Reference

Overview

Profiling Metrics

The model profiler returns results for the entire model as well as individual layers of the model.

Entire Model Metrics

Name Description
Name Name of profiled model
Accelerator Name of hardware accelerator
Input Shape Shape of the model's input tensor
Input Data Type Model input's data type
Output Shape Shape of the model's output tensor
Output Data Type Model output's data type
Model File Size Size of the .tflite model file (this is effectively the flash required by the model)
Runtime Memory Size Size of RAM required for Tensorflow-Lite Micro's working memory
# Operations Number of mathematical operations required to execute the model
# Multiply-Accumulates Number of multiply-accumulate operations required to execute the model
# Layers Number of layers in model
# Unsupported Layers Number of layers that could not be accelerated due to hardware accelerator constraints
# Accelerator Cycles Number of clock cycles required by hardware accelerator
# CPU Cycles Number of CPU clock cycles
CPU Utilization Percentage of CPU used to execute model
Clock Rate CPU clock rate
Time Time required to execute model (i.e. latency)
Energy Energy required to execute model (relative to CPU idling)
J/Op Energy per operation
J/MAC Energy per multiply-accumulate
Ops/s Operations per second
MACs/s Multiply-accumulate per second
Inference/s Number of times the model can execute per second

Per Layer Metrics

Name Description
Index Model layer index
OpCode Kernel Layer name
# Ops Number of mathematical operations required by layer
# MACs Number of multiply-accumulate operations required by layer
Acc Cycles Number of accelerator cycles required by layer
CPU Cycles Number of CPU cycles required by layer
Energy Energy required by layer (relative to CPU idling)
Time Time required to execute layer (i.e. latency)
Input Shape Shape(s) of layer input tensor(s)
Output Shape Shape(s) of layer output tensor(s)
Options Kernel configuration options used by layer
Supported? False if the layer was not able to be accelerated, True else
Error Msg Error message if layer was not able to be accelerated

Modes of Operation

The model profiler has three modes of operation:

Basic Simulator Mode

The model executes the Tensorflow-Lite Micro ARM CMSIS kernels and reference kernels in a basic simulator.
All returned profiling information is estimated.

  • No physical device required

  • Estimates CPU cycles and latency

  • Estimates required energy per inference

NOTE: Estimates are provided based on the ARM Cortex-M33.

Hardware Simulator Mode

The model executes in hardware accelerator simulator.
All returned profiling information is calculated or estimated.

  • No physical device required

  • Accelerator cycles calculated in hardware simulator

  • Estimates CPU cycles and latency

  • Estimates required energy per inference

Note

Estimated numbers are based on the EFR32xG24 at 78MHz

Physical Device Mode

The model executes and is profiled on a physical device.
This allows for determining actual profiling numbers (i.e. not calculated or estimated).

  • Physical device must be locally connected

  • Accelerator cycles, CPU cycles, and latency measured on physical device

  • No energy measurements provided

Command

Model profiling from the command-line is done using profile operation.

For more details on the available command-line options, issue the command:

mltk profile --help

The following are examples of how the profiler can be invoked from the command-line:

Example 1: Profile in basic simulator

Profile the given .tflite model file in the basic simulator.
With this command, no physical device is required.
This command will also provide profiling results for:

  • Estimated latency (i.e. seconds per inference)

  • Estimated CPU cycles

  • Estimated energy

mltk profile ~/workspace/my_model.tflite --estimates

Example 2: Profile in MVP hardware simulator

Profile the given .tflite model file in the MVP hardware simulator.
With this command, no physical device is required.
This command will also provide profiling results for:

  • Estimated latency (i.e. seconds per inference)

  • Calculated accelerator cycles

  • Estimated CPU cycles

  • Estimated energy

mltk profile ~/workspace/my_model.tflite --accelerator MVP --estimates

Example 3: Profile on physical device using MVP hardware accelerator

Profile the given .tflite model file on a physically connected embedded device using the MVP hardware accelerator.
This command will also provide measured profiling results for:

  • Latency (i.e. seconds per inference)

  • Accelerator cycles

  • CPU cycles

mltk profile ~/workspace/my_model.tflite --accelerator MVP --device

Example 4: Profile model before training

Training a model can be very time-consuming, and it is useful to know how efficiently a model will execute on an embedded device before investing time and energy into training it. For this reason, the MLTK profile command features a --build flag to build a model and profile it before the model is fully trained.

In this example, the image_example1 model is built at command-execution-time and profiled in the MVP hardware simulator. Note that only the model specification script is required, it does not need to be trained first.

mltk profile image_example1 --build --accelerator MVP --estimates

Python API

The model profiler is accessible via profile_model API.

Examples using this API may be found in profile_model.ipynb