Why is the model not returning correct results on the embedded device?

There can be many reasons why the output of the .tflite model does not generate valid results when executing on an embedded device.

Some of these reasons include:

Input Data Preprocessing

One of the more common reasons is that the input data is not formatted correctly. Recall that whatever preprocessing is done to the dataset during model training must also be done to the input samples on the embedded device at runtime. So, for instance, if the training dataset is comprised of spectrograms, then whatever algorithms were used to convert the raw audio samples into the spectrograms must also be used on the embedded device (See the AudioFeatureGenerator for how the MLTK can aid spectrogram generation).

The MLTK also supports creating Python Wrappers which allows for sharing C++ source code between the model training scripts (i.e. Python) and the embedded device (i.e. C++). With this, algorithms can be developed in C++ and used to preprocess the data during model training. Later, the exact same C++ algorithms can be built into the embedded firmware application. This way, the preprocessing algorithms only need to be written once and can be shared between model training and model execution.

Input Data Type

Another common issue can occur when the .tflite model input has an int8 data type, e.g.:

my_model.tflite_converter['inference_input_type'] = tf.int8
my_model.tflite_converter['inference_output_type'] = tf.int8

but the raw data uses another data type, e.g. uint8.

In this case, both the model training scripts and embedded device must convert the sample data to int8.

For example, say we’re creating an image classification model and our dataset contains uint8 images. But, we want our model’s input data type to be int8.

The our model specification script might contain:

# Tell the TF-Lite Converter to use int8 model input/output data types
my_model.tflite_converter['inference_input_type'] = tf.int8
my_model.tflite_converter['inference_output_type'] = tf.int8

...

# This is called by the ParallelImageDataGenerator() for each training sample
# It converts the data type from uint8 to int8
def convert_img_from_uint8_to_int8(params:ParallelProcessParams, x:np.ndarray) -> np.ndarray:
  # x is a float32 dtype but has an uint8 range
  x = np.clip(x, 0, 255) # The data should already been in the uint8 range, but clip it just to be sure
  x = x - 128 # Convert from uint8 to int8
  x = x.astype(np.int8)
  return x

# Define the data generator with the data conversion callback
my_model.datagen = ParallelImageDataGenerator(
  preprocessing_function=convert_img_from_uint8_to_int8,
  ...

With this, the model is trained with int8 input data samples.

Additionally, on the embedded device, we must manually convert the uint8 data from the camera to int8, e.g.:

for(int i = 0; i < image_length; ++i)
{
  model_input->data.int8[i] = (int8_t)(image_data[i] - 128);
}

Hint: Just use float32

You can skip all of the above by using a float32 input data type, e.g.:

my_model.tflite_converter['inference_input_type'] = tf.float32
my_model.tflite_converter['inference_output_type'] = tf.float32

With this, this is no need for the convert_img_from_uint8_to_int8() callback during training nor the image_data[i] - 128 on the embedded device.
The raw uint8 image data can be directly used during training and on the embedded device.
(However, on the embedded device, you’ll need to convert the image data from uint8 to float).

This works because the TfliteConverter automatically adds Quantize and Dequantize layers to the .tflite which internally convert the float input data to/from int8.

Using float32 as the model input data type is useful as the conversion is automatically handled by the .tflite model. However, it does require additional RAM and processing cycles.

It requires additional RAM because the input tensor buffer increases by 4x (i.e.sizeof(int8) vs sizeof(float)).
Also, additional cycles are required to convert to/from int8 and float.