# Model Profiler

The MLTK model profiler provides information about how efficiently a model may run on an embedded target.  
The model profiler allows for executing a `.tflite` model file in a simulator _or_ on a physical embedded target.

This guide describes how to run the model profiler from the command-line or Python API.  
Alternatively, refer to the [Model Profiler Utility](./model_profiler_utility.md) which allows
for running the model profiler as a standalone executable with a webpage interface.


```{note} 
_Any_  `.tflite` model file supported by [Tensorflow-Lite Micro](https://github.com/tensorflow/tflite-micro) 
will work with the model profiler.  
i.e. The `.tflite` does _not_ need to be generated by the MLTK to use the profiler.
```

```{note}
_All_ model profiling is done locally. _No_ data is uploaded to a remote server
```


## Quick Reference

- Command-line: [mltk profile --help](../command_line/profile.md)
- Python API: [profile_model](mltk.core.profile_model)
- Python API examples: [profile_model.ipynb](../../mltk/examples/profile_model.ipynb)


## Overview

### Profiling Metrics

The model profiler returns results for the entire model as well as individual layers of the model.

#### Entire Model Metrics

| Name                      | Description                                                     |
|---------------------------|-----------------------------------------------------------------|
| Name                      | Name of profiled model                                          |
| Accelerator               | Name of hardware accelerator                                    |
| Input Shape               | Shape of the model's input tensor                               |
| Input Data Type           | Model input's data type                                         |
| Output Shape              | Shape of the model's output tensor                              |
| Output Data Type          | Model output's data type                                        |
| Model File Size           | Size of the `.tflite` model file (this is effectively the flash required by the model) |
| Runtime Memory Size       | Size of RAM required for Tensorflow-Lite Micro's working memory |
| # Operations              | Number of mathematical operations required to execute the model |
| # Multiply-Accumulates    | Number of multiply-accumulate operations required to execute the model |
| # Layers                  | Number of layers in model                                       |
| # Unsupported Layers      | Number of layers that could not be accelerated due to hardware accelerator constraints |
| # Accelerator Cycles      | Number of clock cycles required by hardware accelerator         |
| # CPU Cycles              | Number of CPU clock cycles                                      |
| CPU Utilization           | Percentage of CPU used to execute model                         |
| Clock Rate                | CPU clock rate                                                  |
| Time                      | Time required to execute model (i.e. latency)                   |
| Energy                    | Energy required to execute model (relative to CPU idling)       |
| J/Op                      | Energy per operation                                            |
| J/MAC                     | Energy per multiply-accumulate                                  |
| Ops/s                     | Operations per second                                           |
| MACs/s                    | Multiply-accumulate per second                                  |
| Inference/s               | Number of times the model can execute per second                |

#### Per Layer Metrics

| Name                      | Description                                                     |
|---------------------------|-----------------------------------------------------------------|
| Index                     | Model layer index                                               |
| OpCode                    | Kernel Layer name                                               |
| # Ops                     | Number of mathematical operations required by layer             |
| # MACs                    | Number of multiply-accumulate operations required by layer      |
| Acc Cycles                | Number of accelerator cycles required by layer                  |
| CPU Cycles                | Number of CPU cycles required by layer                          |
| Energy                    | Energy required by layer (relative to CPU idling)               |
| Time                      | Time required to execute layer  (i.e. latency)                  |
| Input Shape               | Shape(s) of layer input tensor(s)                               |
| Output Shape              | Shape(s) of layer output tensor(s)                              |
| Options                   | Kernel configuration options used by layer                      |
| Supported?                | `False` if the layer was _not_ able to be accelerated, `True` else |
| Error Msg                 | Error message if layer was not able to be accelerated           |




### Modes of Operation

The model profiler has three modes of operation:

#### Basic Simulator Mode

The model executes the Tensorflow-Lite Micro [ARM CMSIS kernels](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/kernels/cmsis_nn) and reference [kernels](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/kernels) in a basic simulator.  
All returned profiling information is estimated.

- No physical device required
- Estimates CPU cycles and latency
- Estimates required energy per inference

__NOTE:__ Estimates are provided based on the ARM Cortex-M33.


#### Hardware Simulator Mode

The model executes in hardware accelerator simulator.  
All returned profiling information is calculated or estimated.

- No physical device required
- Accelerator cycles calculated in hardware simulator
- Estimates CPU cycles and latency
- Estimates required energy per inference


```{note}
Estimated numbers are based on the __EFR32xG24__ at 78MHz
```

#### Simulator Accuracy

The simulator provides __coarse estimates__ of CPU cycles, latency, energy and are based on the __EFR32xG24__ at 78MHz.
While the estimates may be considered a starting point for model analysis, the __Physical Device Mode__ should
be used for accurate profiling numbers.


#### Physical Device Mode

The model executes and is profiled on a physical device.  
This allows for determining actual profiling numbers (i.e. not calculated or estimated).

- Physical device must be locally connected
- Accelerator cycles, CPU cycles, and latency measured on physical device
- No energy measurements provided


## Command

Model profiling from the command-line is done using `profile` operation.

For more details on the available command-line options, issue the command:

```shell
mltk profile --help
```

The following are examples of how the profiler can be invoked from the command-line:

### Example 1: Profile in basic simulator

Profile the given `.tflite` model file in the basic simulator.  
With this command, no physical device is required.  
This command will also provide profiling results for:  
- Estimated latency (i.e. seconds per inference)
- Estimated CPU cycles
- Estimated energy


```shell
mltk profile ~/workspace/my_model.tflite --estimates
```


### Example 2: Profile in MVP hardware simulator

Profile the given `.tflite` model file in the MVP hardware simulator.  
With this command, no physical device is required.  
This command will also provide profiling results for:  
- Estimated latency (i.e. seconds per inference)
- Calculated accelerator cycles
- Estimated CPU cycles
- Estimated energy

```shell
mltk profile ~/workspace/my_model.tflite --accelerator MVP --estimates
```

### Example 3: Profile on physical device using MVP hardware accelerator

Profile the given `.tflite` model file on a physically connected embedded device using the MVP hardware accelerator.  
This command will also provide measured profiling results for:  
- Latency (i.e. seconds per inference)
- Accelerator cycles
- CPU cycles

```shell
mltk profile ~/workspace/my_model.tflite --accelerator MVP --device
```

### Example 4: Profile model before training

Training a model can be very time-consuming, and it is useful to know how efficiently a 
model will execute on an embedded device before investing time and energy into training it.
For this reason, the MLTK `profile` command features a `--build` flag to build a model
and profile it _before_ the model is fully trained.

In this example, the [image_example1 model](mltk.models.examples.image_example1) is built
at command-execution-time and profiled in the MVP hardware simulator.
Note that _only_ the [model specification](./model_specification.md) script is required, 
it does _not_ need to be trained first.

```shell
mltk profile image_example1 --build --accelerator MVP --estimates
```



## Python API

The model profiler is accessible via [profile_model](mltk.core.profile_model) API.

Examples using this API may be found in [profile_model.ipynb](../../mltk/examples/profile_model.ipynb)