# Model Profiler The MLTK model profiler provides information about how efficiently a model may run on an embedded target. The model profiler allows for executing a `.tflite` model file in a simulator _or_ on a physical embedded target. This guide describes how to run the model profiler from the command-line or Python API. Alternatively, refer to the [Model Profiler Utility](./model_profiler_utility.md) which allows for running the model profiler as a standalone executable with a webpage interface. ```{note} _Any_ `.tflite` model file supported by [Tensorflow-Lite Micro](https://github.com/tensorflow/tflite-micro) will work with the model profiler. i.e. The `.tflite` does _not_ need to be generated by the MLTK to use the profiler. ``` ```{note} _All_ model profiling is done locally. _No_ data is uploaded to a remote server ``` ## Quick Reference - Command-line: [mltk profile --help](../command_line/profile.md) - Python API: [profile_model](mltk.core.profile_model) - Python API examples: [profile_model.ipynb](../../mltk/examples/profile_model.ipynb) ## Overview ### Profiling Metrics The model profiler returns results for the entire model as well as individual layers of the model. #### Entire Model Metrics | Name | Description | |---------------------------|-----------------------------------------------------------------| | Name | Name of profiled model | | Accelerator | Name of hardware accelerator | | Input Shape | Shape of the model's input tensor | | Input Data Type | Model input's data type | | Output Shape | Shape of the model's output tensor | | Output Data Type | Model output's data type | | Model File Size | Size of the `.tflite` model file (this is effectively the flash required by the model) | | Runtime Memory Size | Size of RAM required for Tensorflow-Lite Micro's working memory | | # Operations | Number of mathematical operations required to execute the model | | # Multiply-Accumulates | Number of multiply-accumulate operations required to execute the model | | # Layers | Number of layers in model | | # Unsupported Layers | Number of layers that could not be accelerated due to hardware accelerator constraints | | # Accelerator Cycles | Number of clock cycles required by hardware accelerator | | # CPU Cycles | Number of CPU clock cycles | | CPU Utilization | Percentage of CPU used to execute model | | Clock Rate | CPU clock rate | | Time | Time required to execute model (i.e. latency) | | Energy | Energy required to execute model (relative to CPU idling) | | J/Op | Energy per operation | | J/MAC | Energy per multiply-accumulate | | Ops/s | Operations per second | | MACs/s | Multiply-accumulate per second | | Inference/s | Number of times the model can execute per second | #### Per Layer Metrics | Name | Description | |---------------------------|-----------------------------------------------------------------| | Index | Model layer index | | OpCode | Kernel Layer name | | # Ops | Number of mathematical operations required by layer | | # MACs | Number of multiply-accumulate operations required by layer | | Acc Cycles | Number of accelerator cycles required by layer | | CPU Cycles | Number of CPU cycles required by layer | | Energy | Energy required by layer (relative to CPU idling) | | Time | Time required to execute layer (i.e. latency) | | Input Shape | Shape(s) of layer input tensor(s) | | Output Shape | Shape(s) of layer output tensor(s) | | Options | Kernel configuration options used by layer | | Supported? | `False` if the layer was _not_ able to be accelerated, `True` else | | Error Msg | Error message if layer was not able to be accelerated | ### Modes of Operation The model profiler has three modes of operation: #### Basic Simulator Mode The model executes the Tensorflow-Lite Micro [ARM CMSIS kernels](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/kernels/cmsis_nn) and reference [kernels](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/kernels) in a basic simulator. All returned profiling information is estimated. - No physical device required - Estimates CPU cycles and latency - Estimates required energy per inference __NOTE:__ Estimates are provided based on the ARM Cortex-M33. #### Hardware Simulator Mode The model executes in hardware accelerator simulator. All returned profiling information is calculated or estimated. - No physical device required - Accelerator cycles calculated in hardware simulator - Estimates CPU cycles and latency - Estimates required energy per inference ```{note} Estimated numbers are based on the __EFR32xG24__ at 78MHz ``` #### Physical Device Mode The model executes and is profiled on a physical device. This allows for determining actual profiling numbers (i.e. not calculated or estimated). - Physical device must be locally connected - Accelerator cycles, CPU cycles, and latency measured on physical device - No energy measurements provided ## Command Model profiling from the command-line is done using `profile` operation. For more details on the available command-line options, issue the command: ```shell mltk profile --help ``` The following are examples of how the profiler can be invoked from the command-line: ### Example 1: Profile in basic simulator Profile the given `.tflite` model file in the basic simulator. With this command, no physical device is required. This command will also provide profiling results for: - Estimated latency (i.e. seconds per inference) - Estimated CPU cycles - Estimated energy ```shell mltk profile ~/workspace/my_model.tflite --estimates ``` ### Example 2: Profile in MVP hardware simulator Profile the given `.tflite` model file in the MVP hardware simulator. With this command, no physical device is required. This command will also provide profiling results for: - Estimated latency (i.e. seconds per inference) - Calculated accelerator cycles - Estimated CPU cycles - Estimated energy ```shell mltk profile ~/workspace/my_model.tflite --accelerator MVP --estimates ``` ### Example 3: Profile on physical device using MVP hardware accelerator Profile the given `.tflite` model file on a physically connected embedded device using the MVP hardware accelerator. This command will also provide measured profiling results for: - Latency (i.e. seconds per inference) - Accelerator cycles - CPU cycles ```shell mltk profile ~/workspace/my_model.tflite --accelerator MVP --device ``` ### Example 4: Profile model before training Training a model can be very time-consuming, and it is useful to know how efficiently a model will execute on an embedded device before investing time and energy into training it. For this reason, the MLTK `profile` command features a `--build` flag to build a model and profile it _before_ the model is fully trained. In this example, the [image_example1 model](mltk.models.examples.image_example1) is built at command-execution-time and profiled in the MVP hardware simulator. Note that _only_ the [model specification](./model_specification.md) script is required, it does _not_ need to be trained first. ```shell mltk profile image_example1 --build --accelerator MVP --estimates ``` ## Python API The model profiler is accessible via [profile_model](mltk.core.profile_model) API. Examples using this API may be found in [profile_model.ipynb](../../mltk/examples/profile_model.ipynb)