How can I reduce my model’s size?¶
A model’s size requirements are largely dependent on the application, however, there are several things to keep in mind which can help to reduce its resource requirements.
What are the model resource requirements?¶
The resources required by a machine learning model are:
RAM - To hold the working memory
Flash - To hold the trained model parameters (i.e. weights and filters)
Processing cycles - The CPU and/or accelerator cycles needed to execute the model, this directly determines how long it takes to execute the model
These values can be determined using the Model Profiler.
Additionally, the model profiler will indicate if a layer is not able to be accelerated by a hardware accelerator, such as the MVP. When this happens the layer is executed by a slower software implementation. In this case, the layer can be reduced using the following tips so that it fits on the hardware accelerator, and thus executes faster.
Reduce model input size¶
Reducing the size of the model input tensor(s) is usually one of the most effective ways of reducing the model’s total resource requirements.
Use int8 model input data type¶
float32 is a common model input data type.
While this is useful for automatically quantizing
the raw input data, it can increase the model’s RAM
usage by ~4x compared to the
int8 input data type.
Reduce Filter Count¶
These can increase the model’s accuracy at the expense of additional processing and thus execution latency. Reducing these values can reduce the number of model operations and thus execution time.
Increasing this value can reduce the number of model operations and thus execution time. Increasing this value also reduce the layer’s output size which can help to make the layer fit within the hardware accelerator’s constraints.
Decrease kernel sizes¶
Reducing this value can reduce the number of model operations and thus execution time.
Reduce FullyConnected units¶
The Dense (a.k.a. FullyConnected) layer
units parameter. Increasing this value can increase model accuracy at the expense
of additional computations.
Decreasing this value can reduce the number of model operations and thus execution time.