Audio Feature Generator Example

This demonstrates how to:

  1. Load a quantized keyword spotting model

  2. Manually invoke the Audio Feature Generator APIs to generate a spectrogram from an audio file

  3. Run inference with the manually processed audio sample using Tensorflow-Lite and Tensorflow-Lite Micro

In this example, we use the keyword_spotting_numbers ML model.

NOTES:

  • Click here: Open In Colab to run this example interactively in your browser

  • Refer to the Notebook Examples Guide for how to run this example locally in VSCode

Install the MLTK python package

# Install the MLTK Python package (if necessary)
!pip install --upgrade silabs-mltk

Import the Python packages

import os
import pprint
import numpy as np
from mltk.datasets import audio as audio_datasets
from mltk.core.preprocess.utils import audio as audio_utils
from mltk.core.preprocess.audio.audio_feature_generator import AudioFeatureGeneratorSettings
from mltk.core import load_mltk_model, TfliteModel, TfliteModelParameters
from mltk.core.tflite_micro import TfliteMicro
from mltk.utils.archive_downloader import download_url

Load Audio Sample

First we need to obtain an audio sample. In this example, we load a random sample from the ten_digits dataset which was used to train the keyword_spotting_numbers ML model.

# Download the "ten digits" dataset (if necessary)
dataset_dir = audio_datasets.ten_digits.download()

# And grab the first sample in the 'seven' directory
audio_sample_dir = f'{dataset_dir}/seven'
audio_sample_fn = list(os.listdir(audio_sample_dir))[0]

audio_sample_path = f'{audio_sample_dir}/{audio_sample_fn}'
print(f'Using audio sample: {audio_sample_path}')

# Load the audio file into memory
audio_sample_data, audio_sample_rate_hz = audio_utils.read_audio_file(audio_sample_path, return_numpy=True, return_sample_rate=True)

audio_sample_length = len(audio_sample_data)
audio_sample_length_seconds = audio_sample_length/audio_sample_rate_hz
print(f'Sample length: {audio_sample_length_seconds:.1f}s, rate: {audio_sample_rate_hz/1000:.1f}kHz')
Using audio sample: C:/Users/dried/.mltk/datasets/ten_digits/seven/aws_ar-AE+Hala+seven+medium+medium+1209d48a.wav
Sample length: 0.7s, rate: 16.0kHz

Load the MLTK Model

Next, we load the MLTK model. We also print a list of the “classes” supported by the model with their corresponding list indices.

# Download the "keyword_spotting_numbers" model archive from github
mltk_model_archive_path = download_url(
    url='https://github.com/SiliconLabs/mltk/raw/master/mltk/models/siliconlabs/keyword_spotting_numbers.mltk.zip', 
    dst_path='keyword_spotting_numbers.mltk.zip', 
    show_progress=True
)

# Load the MLTK model
mltk_model = load_mltk_model(mltk_model_archive_path)

# Retrieve the classes used by the model
classes = mltk_model.classes

# Print classes and their corresponding indices
print(f'Model: {mltk_model.name} classifies {mltk_model.n_classes} classes with the following mapping:')
for class_index, class_label in enumerate(classes):
    print(f'{class_index:2d} -> {class_label}')
Downloading https://github.com/SiliconLabs/mltk/raw/master/mltk/models/siliconlabs/keyword_spotting_numbers.mltk.zip
to c:/Users/dried/silabs/github_siliconlabs/mltk/mltk/examples/keyword_spotting_numbers.mltk.zip
(This may take awhile, please be patient ...)
Extracting keyword_spotting_numbers.py from C:/Users/dried/silabs/github_siliconlabs/mltk/mltk/examples/keyword_spotting_numbers.mltk.zip                     
Model: keyword_spotting_numbers classifies 11 classes with the following mapping:
 0 -> zero
 1 -> one
 2 -> two
 3 -> three
 4 -> four
 5 -> five
 6 -> six
 7 -> seven
 8 -> eight
 9 -> nine
10 -> _unknown_

Load the .tflite model

Next, we load the trained and quantized keyword_spotting_numbers .tflite model. We do this by extracting the .tflite from the keyword_spotting_numbers.mltk.zip model archive and loading it into a TfliteModel instance.

# Get the file path to the .tflite in the keyword_spotting_numbers.mltk.zip model archive
tflite_path = mltk_model.tflite_archive_path
print(f'.tflite path: {tflite_path}')

# Load the .tflite file into a TfliteModel instance
tflite_model = TfliteModel.load_flatbuffer_file(tflite_path)

# Generate a summary of the model
print(tflite_model.summary())
.tflite path: E:/dried/mltk/models/keyword_spotting_numbers/extracted_archive/keyword_spotting_numbers.tflite
+-------+------------------------------+-------------------+-----------------+------------------------------------------------------+
| Index | OpCode                       | Input(s)          | Output(s)       | Config                                               |
+-------+------------------------------+-------------------+-----------------+------------------------------------------------------+
| 0     | quantize                     | 98x1x40 (float32) | 98x1x40 (int8)  | Type=none                                            |
| 1     | conv_2d                      | 98x1x40 (int8)    | 98x1x40 (int8)  | Padding:Same stride:1x1 activation:None              |
|       |                              | 3x1x40 (int8)     |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 2     | conv_2d                      | 98x1x40 (int8)    | 98x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 3     | depthwise_conv_2d            | 98x1x120 (int8)   | 49x1x120 (int8) | Multiplier:1 padding:Same stride:2x2 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 4     | conv_2d                      | 49x1x120 (int8)   | 49x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 5     | conv_2d                      | 98x1x40 (int8)    | 49x1x40 (int8)  | Padding:Same stride:2x2 activation:Relu              |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 6     | add                          | 49x1x40 (int8)    | 49x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 49x1x40 (int8)    |                 |                                                      |
| 7     | conv_2d                      | 49x1x40 (int8)    | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 8     | depthwise_conv_2d            | 49x1x120 (int8)   | 49x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 9     | conv_2d                      | 49x1x120 (int8)   | 49x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 10    | add                          | 49x1x40 (int8)    | 49x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 49x1x40 (int8)    |                 |                                                      |
| 11    | conv_2d                      | 49x1x40 (int8)    | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 12    | depthwise_conv_2d            | 49x1x120 (int8)   | 49x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 13    | conv_2d                      | 49x1x120 (int8)   | 49x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 14    | add                          | 49x1x40 (int8)    | 49x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 49x1x40 (int8)    |                 |                                                      |
| 15    | conv_2d                      | 49x1x40 (int8)    | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 16    | depthwise_conv_2d            | 49x1x120 (int8)   | 49x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 17    | conv_2d                      | 49x1x120 (int8)   | 49x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 18    | add                          | 49x1x40 (int8)    | 49x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 49x1x40 (int8)    |                 |                                                      |
| 19    | conv_2d                      | 49x1x40 (int8)    | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 20    | depthwise_conv_2d            | 49x1x120 (int8)   | 25x1x120 (int8) | Multiplier:1 padding:Same stride:2x2 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 21    | conv_2d                      | 25x1x120 (int8)   | 25x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 22    | conv_2d                      | 49x1x40 (int8)    | 25x1x40 (int8)  | Padding:Same stride:2x2 activation:Relu              |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 23    | add                          | 25x1x40 (int8)    | 25x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 25x1x40 (int8)    |                 |                                                      |
| 24    | conv_2d                      | 25x1x40 (int8)    | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 25    | depthwise_conv_2d            | 25x1x120 (int8)   | 25x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 26    | conv_2d                      | 25x1x120 (int8)   | 25x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 27    | add                          | 25x1x40 (int8)    | 25x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 25x1x40 (int8)    |                 |                                                      |
| 28    | conv_2d                      | 25x1x40 (int8)    | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 29    | depthwise_conv_2d            | 25x1x120 (int8)   | 25x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 30    | conv_2d                      | 25x1x120 (int8)   | 25x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 31    | add                          | 25x1x40 (int8)    | 25x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 25x1x40 (int8)    |                 |                                                      |
| 32    | conv_2d                      | 25x1x40 (int8)    | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 33    | depthwise_conv_2d            | 25x1x120 (int8)   | 25x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 34    | conv_2d                      | 25x1x120 (int8)   | 25x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 35    | add                          | 25x1x40 (int8)    | 25x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 25x1x40 (int8)    |                 |                                                      |
| 36    | conv_2d                      | 25x1x40 (int8)    | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 37    | depthwise_conv_2d            | 25x1x120 (int8)   | 13x1x120 (int8) | Multiplier:1 padding:Same stride:2x2 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 38    | conv_2d                      | 13x1x120 (int8)   | 13x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 39    | conv_2d                      | 25x1x40 (int8)    | 13x1x40 (int8)  | Padding:Same stride:2x2 activation:Relu              |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 40    | add                          | 13x1x40 (int8)    | 13x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 13x1x40 (int8)    |                 |                                                      |
| 41    | conv_2d                      | 13x1x40 (int8)    | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 42    | depthwise_conv_2d            | 13x1x120 (int8)   | 13x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 43    | conv_2d                      | 13x1x120 (int8)   | 13x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 44    | add                          | 13x1x40 (int8)    | 13x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 13x1x40 (int8)    |                 |                                                      |
| 45    | conv_2d                      | 13x1x40 (int8)    | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 46    | depthwise_conv_2d            | 13x1x120 (int8)   | 13x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 47    | conv_2d                      | 13x1x120 (int8)   | 13x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 48    | add                          | 13x1x40 (int8)    | 13x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 13x1x40 (int8)    |                 |                                                      |
| 49    | conv_2d                      | 13x1x40 (int8)    | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 50    | depthwise_conv_2d            | 13x1x120 (int8)   | 13x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 51    | conv_2d                      | 13x1x120 (int8)   | 13x1x40 (int8)  | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 52    | add                          | 13x1x40 (int8)    | 13x1x40 (int8)  | Activation:Relu                                      |
|       |                              | 13x1x40 (int8)    |                 |                                                      |
| 53    | conv_2d                      | 13x1x40 (int8)    | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 54    | depthwise_conv_2d            | 13x1x120 (int8)   | 7x1x120 (int8)  | Multiplier:1 padding:Same stride:2x2 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 55    | conv_2d                      | 7x1x120 (int8)    | 7x1x40 (int8)   | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 56    | conv_2d                      | 13x1x40 (int8)    | 7x1x40 (int8)   | Padding:Same stride:2x2 activation:Relu              |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 57    | add                          | 7x1x40 (int8)     | 7x1x40 (int8)   | Activation:Relu                                      |
|       |                              | 7x1x40 (int8)     |                 |                                                      |
| 58    | conv_2d                      | 7x1x40 (int8)     | 7x1x120 (int8)  | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 59    | depthwise_conv_2d            | 7x1x120 (int8)    | 7x1x120 (int8)  | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 60    | conv_2d                      | 7x1x120 (int8)    | 7x1x40 (int8)   | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 61    | add                          | 7x1x40 (int8)     | 7x1x40 (int8)   | Activation:Relu                                      |
|       |                              | 7x1x40 (int8)     |                 |                                                      |
| 62    | conv_2d                      | 7x1x40 (int8)     | 7x1x120 (int8)  | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 63    | depthwise_conv_2d            | 7x1x120 (int8)    | 7x1x120 (int8)  | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 64    | conv_2d                      | 7x1x120 (int8)    | 7x1x40 (int8)   | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 65    | add                          | 7x1x40 (int8)     | 7x1x40 (int8)   | Activation:Relu                                      |
|       |                              | 7x1x40 (int8)     |                 |                                                      |
| 66    | conv_2d                      | 7x1x40 (int8)     | 7x1x120 (int8)  | Padding:Valid stride:1x1 activation:Relu             |
|       |                              | 1x1x40 (int8)     |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 67    | depthwise_conv_2d            | 7x1x120 (int8)    | 7x1x120 (int8)  | Multiplier:1 padding:Same stride:1x1 activation:Relu |
|       |                              | 9x1x120 (int8)    |                 |                                                      |
|       |                              | 120 (int32)       |                 |                                                      |
| 68    | conv_2d                      | 7x1x120 (int8)    | 7x1x40 (int8)   | Padding:Valid stride:1x1 activation:None             |
|       |                              | 1x1x120 (int8)    |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
| 69    | add                          | 7x1x40 (int8)     | 7x1x40 (int8)   | Activation:Relu                                      |
|       |                              | 7x1x40 (int8)     |                 |                                                      |
| 70    | reshape                      | 7x1x40 (int8)     | 7x40x1 (int8)   | Type=none                                            |
|       |                              | 4 (int32)         |                 |                                                      |
| 71    | transpose                    | 7x40x1 (int8)     | 40x1x7 (int8)   | Type=none                                            |
|       |                              | 4 (int32)         |                 |                                                      |
| 72    | mean                         | 40x1x7 (int8)     | 7 (int8)        | Type=reduceroptions                                  |
|       |                              | 3 (int32)         |                 |                                                      |
| 73    | squared_difference           | 40x1x7 (int8)     | 40x1x7 (int8)   | Type=none                                            |
|       |                              | 7 (int8)          |                 |                                                      |
| 74    | mean                         | 40x1x7 (int8)     | 7 (int8)        | Type=reduceroptions                                  |
|       |                              | 3 (int32)         |                 |                                                      |
| 75    | add                          | 7 (int8)          | 7 (int8)        | Activation:None                                      |
|       |                              |  (int8)           |                 |                                                      |
| 76    | rsqrt                        | 7 (int8)          | 7 (int8)        | Type=none                                            |
| 77    | mul                          | 40x1x7 (int8)     | 40x1x7 (int8)   | Activation:None                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 78    | mul                          | 7 (int8)          | 7 (int8)        | Activation:None                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 79    | sub                          | 7 (int8)          | 7 (int8)        | Type=suboptions                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 80    | add                          | 40x1x7 (int8)     | 40x1x7 (int8)   | Activation:None                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 81    | transpose                    | 40x1x7 (int8)     | 7x40x1 (int8)   | Type=none                                            |
|       |                              | 4 (int32)         |                 |                                                      |
| 82    | reshape                      | 7x40x1 (int8)     | 7x40 (int8)     | Type=none                                            |
|       |                              | 3 (int32)         |                 |                                                      |
| 83    | mul                          | 7x40 (int8)       | 7x40 (int8)     | Activation:None                                      |
|       |                              | 40 (int8)         |                 |                                                      |
| 84    | add                          | 7x40 (int8)       | 7x40 (int8)     | Activation:None                                      |
|       |                              | 40 (int8)         |                 |                                                      |
| 85    | unidirectional_sequence_lstm | 7x40 (int8)       | 7x40 (int8)     | Time major:False, Activation:Tanh, Cell clip:10.0    |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
|       |                              | 40 (int32)        |                 |                                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 40 (int16)        |                 |                                                      |
| 86    | reshape                      | 7x40 (int8)       | 7x40x1 (int8)   | Type=none                                            |
|       |                              | 4 (int32)         |                 |                                                      |
| 87    | transpose                    | 7x40x1 (int8)     | 40x1x7 (int8)   | Type=none                                            |
|       |                              | 4 (int32)         |                 |                                                      |
| 88    | mean                         | 40x1x7 (int8)     | 7 (int8)        | Type=reduceroptions                                  |
|       |                              | 3 (int32)         |                 |                                                      |
| 89    | squared_difference           | 40x1x7 (int8)     | 40x1x7 (int8)   | Type=none                                            |
|       |                              | 7 (int8)          |                 |                                                      |
| 90    | mean                         | 40x1x7 (int8)     | 7 (int8)        | Type=reduceroptions                                  |
|       |                              | 3 (int32)         |                 |                                                      |
| 91    | add                          | 7 (int8)          | 7 (int8)        | Activation:None                                      |
|       |                              |  (int8)           |                 |                                                      |
| 92    | rsqrt                        | 7 (int8)          | 7 (int8)        | Type=none                                            |
| 93    | mul                          | 40x1x7 (int8)     | 40x1x7 (int8)   | Activation:None                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 94    | mul                          | 7 (int8)          | 7 (int8)        | Activation:None                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 95    | sub                          | 7 (int8)          | 7 (int8)        | Type=suboptions                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 96    | add                          | 40x1x7 (int8)     | 40x1x7 (int8)   | Activation:None                                      |
|       |                              | 7 (int8)          |                 |                                                      |
| 97    | transpose                    | 40x1x7 (int8)     | 7x40x1 (int8)   | Type=none                                            |
|       |                              | 4 (int32)         |                 |                                                      |
| 98    | reshape                      | 7x40x1 (int8)     | 7x40 (int8)     | Type=none                                            |
|       |                              | 3 (int32)         |                 |                                                      |
| 99    | mul                          | 7x40 (int8)       | 7x40 (int8)     | Activation:None                                      |
|       |                              | 40 (int8)         |                 |                                                      |
| 100   | add                          | 7x40 (int8)       | 7x40 (int8)     | Activation:None                                      |
|       |                              | 40 (int8)         |                 |                                                      |
| 101   | strided_slice                | 7x40 (int8)       | 40 (int8)       | Type=stridedsliceoptions                             |
|       |                              | 3 (int32)         |                 |                                                      |
|       |                              | 3 (int32)         |                 |                                                      |
|       |                              | 3 (int32)         |                 |                                                      |
| 102   | fully_connected              | 40 (int8)         | 11 (int8)       | Activation:None                                      |
|       |                              | 40 (int8)         |                 |                                                      |
|       |                              | 11 (int32)        |                 |                                                      |
| 103   | reshape                      | 11 (int8)         | 11x1x1 (int8)   | Type=none                                            |
|       |                              | 4 (int32)         |                 |                                                      |
| 104   | mean                         | 11x1x1 (int8)     | 1 (int8)        | Type=reduceroptions                                  |
|       |                              | 3 (int32)         |                 |                                                      |
| 105   | squared_difference           | 11x1x1 (int8)     | 11x1x1 (int8)   | Type=none                                            |
|       |                              | 1 (int8)          |                 |                                                      |
| 106   | mean                         | 11x1x1 (int8)     | 1 (int8)        | Type=reduceroptions                                  |
|       |                              | 3 (int32)         |                 |                                                      |
| 107   | add                          | 1 (int8)          | 1 (int8)        | Activation:None                                      |
|       |                              |  (int8)           |                 |                                                      |
| 108   | rsqrt                        | 1 (int8)          | 1 (int8)        | Type=none                                            |
| 109   | mul                          | 11x1x1 (int8)     | 11x1x1 (int8)   | Activation:None                                      |
|       |                              | 1 (int8)          |                 |                                                      |
| 110   | mul                          | 1 (int8)          | 1 (int8)        | Activation:None                                      |
|       |                              | 1 (int8)          |                 |                                                      |
| 111   | sub                          | 1 (int8)          | 1 (int8)        | Type=suboptions                                      |
|       |                              | 1 (int8)          |                 |                                                      |
| 112   | add                          | 11x1x1 (int8)     | 11x1x1 (int8)   | Activation:None                                      |
|       |                              | 1 (int8)          |                 |                                                      |
| 113   | reshape                      | 11x1x1 (int8)     | 11 (int8)       | Type=none                                            |
|       |                              | 2 (int32)         |                 |                                                      |
| 114   | mul                          | 11 (int8)         | 11 (int8)       | Activation:None                                      |
|       |                              | 11 (int8)         |                 |                                                      |
| 115   | add                          | 11 (int8)         | 11 (int8)       | Activation:None                                      |
|       |                              | 11 (int8)         |                 |                                                      |
| 116   | softmax                      | 11 (int8)         | 11 (int8)       | Type=softmaxoptions                                  |
| 117   | dequantize                   | 11 (int8)         | 11 (float32)    | Type=none                                            |
+-------+------------------------------+-------------------+-----------------+------------------------------------------------------+

Process the audio sample in the AudioFeatureGenerator

Next, we process the audio sample in the Audio Feature Generator. This will convert the raw audio into a spectrogram image which can be given to the .tflite model for classification.

To process the audio sample, we must use the AudioFeatureGenerator settings embedded into the .tflite. These are the settings that were used to train the model and also the settings used by the embedded device at runtime.

# Retrieve the AudioFeatureGenerator settings from the .tflite
tflite_params = TfliteModelParameters.load_from_tflite_file(tflite_path)

# Load the .tflite parameters into a AudioFeatureGeneratorSettings instance 
tflite_frontend_settings = AudioFeatureGeneratorSettings(**tflite_params)

print(f'Audio frontend settings:\n{pprint.pformat(tflite_frontend_settings)}')

# Adjust the audio sample so that it is the correct length expected by the audio frontend settings
frontend_sample_length = int((audio_sample_rate_hz * tflite_frontend_settings.sample_length_ms) / 1000)
adjusted_audio_sample_data = audio_utils.adjust_length(
    audio_sample_data,
    out_length=frontend_sample_length,
    trim_threshold_db=30,
    offset=0
)

# Process the length-adjusted audio in the audio frontend (aka AudioFeatureGenerator).
# This will generate a spectrogram from the raw audio using the settings embedded into the .tflite
spectrogram = audio_utils.apply_frontend(
    sample=adjusted_audio_sample_data,
    settings=tflite_frontend_settings,
    dtype=np.uint16 # We just want the raw, uint16 output of the generated spectrogram
)
print(f'Generated spectrogram shape: {"x".join(map(str, spectrogram.shape))} ({spectrogram.dtype})')

# The generated spectrogram is uint16.
# However, the keyword_spotting_numbers model expects a normalized, float32 input.
# So, we use numpy to normalize the input sample
# norm_spectrogram = (spectrogram - mean(spectrogram)) / std(spectrogram)
norm_spectrogram = spectrogram.astype(np.float32)
norm_spectrogram -= np.mean(norm_spectrogram, dtype=np.float32, keepdims=False)
norm_spectrogram /= (np.std(norm_spectrogram, dtype=np.float32, keepdims=False) + 1e-6)
print(f'Normalized spectrogram shape: {"x".join(map(str, norm_spectrogram.shape))} ({norm_spectrogram.dtype})')

# The keyword_spotting_numbers model also expects the input shape to be:
# <time, 1, features>
# So, we insert an extra dimension:
tflite_input_spectrogram = np.expand_dims(norm_spectrogram, axis=-2)
print(f'.tflite input spectrogram shape: {"x".join(map(str, tflite_input_spectrogram.shape))} ({tflite_input_spectrogram.dtype})')
Audio frontend settings:
{'average_window_duration_ms': 450,
 'classes': ['zero',
             'one',
             'two',
             'three',
             'four',
             'five',
             'six',
             'seven',
             'eight',
             'nine',
             '_unknown_'],
 'date': '2023-08-03T01:11:43.378Z',
 'detection_threshold': 242,
 'fe.activity_detection_alpha_a': 0.5,
 'fe.activity_detection_alpha_b': 0.800000011920929,
 'fe.activity_detection_arm_threshold': 0.75,
 'fe.activity_detection_enable': False,
 'fe.activity_detection_trip_threshold': 0.800000011920929,
 'fe.dc_notch_filter_coefficient': 0.949999988079071,
 'fe.dc_notch_filter_enable': True,
 'fe.fft_length': 512,
 'fe.filterbank_lower_band_limit': 125.0,
 'fe.filterbank_n_channels': 40,
 'fe.filterbank_upper_band_limit': 7500.0,
 'fe.log_scale_enable': True,
 'fe.log_scale_shift': 6,
 'fe.noise_reduction_enable': True,
 'fe.noise_reduction_even_smoothing': 0.02500000037252903,
 'fe.noise_reduction_min_signal_remaining': 0.4000000059604645,
 'fe.noise_reduction_odd_smoothing': 0.05999999865889549,
 'fe.noise_reduction_smoothing_bits': 10,
 'fe.pcan_enable': False,
 'fe.pcan_gain_bits': 21,
 'fe.pcan_offset': 80.0,
 'fe.pcan_strength': 0.949999988079071,
 'fe.quantize_dynamic_scale_enable': False,
 'fe.quantize_dynamic_scale_range_db': 40.0,
 'fe.sample_length_ms': 1000,
 'fe.sample_rate_hz': 16000,
 'fe.window_size_ms': 30,
 'fe.window_step_ms': 10,
 'hash': '4b22adb625a3300fdcf06fa61105782f',
 'latency_ms': 10,
 'minimum_count': 2,
 'name': 'keyword_spotting_numbers',
 'runtime_memory_size': 80500,
 'samplewise_norm.mean_and_std': True,
 'suppression_ms': 700,
 'verbose_model_output_logs': False,
 'version': 1,
 'volume_gain': 0.0}
Generated spectrogram shape: 98x40 (uint16)
Normalized spectrogram shape: 98x40 (float32)
.tflite input spectrogram shape: 98x1x40 (float32)

Classify the audio sample using TF-Lite

Next, we give the processed audio sample to the .tflite model instance which will classify the audio. The model output is a list of probabilities. The list entry with the largest probability is the “class” to which the model thinks the audio sample belongs.

NOTE: This uses the default int8 “kernels” that come with TF-Lite

# Give the processed audio sample (which is now a normalized spectrogram)
# to the trained and quantized keyword_spotting_numbers.tflite model,
# which will classify the sample and return the classification results
classification_results = tflite_model.predict(tflite_input_spectrogram)
print(f'Raw classification results: {classification_results}')

# Find the index of the largest entry in the list
predicted_class_index = np.argmax(classification_results)
prediction_confidence = classification_results[predicted_class_index]

print(f'The model "{mltk_model.name}" using the reference int8 Tensorflow-Lite kernels predict that the audio sample file:\n{audio_sample_path}\nbelongs to the class: "{classes[predicted_class_index]}" with a confidence of {prediction_confidence*100:.1f}%')
Raw classification results: [0.         0.         0.         0.         0.         0.
 0.         0.99609375 0.         0.         0.        ]
The model "keyword_spotting_numbers" using the reference int8 Tensorflow-Lite kernels predict that the audio sample file:
C:/Users/dried/.mltk/datasets/ten_digits/seven/aws_ar-AE+Hala+seven+medium+medium+1209d48a.wav
belongs to the class: "seven" with a confidence of 99.6%

Classify the audio using TF-Lite Micro

Before, we used the int8 Tensorflow-Lite kernels that come with the Tensorflow Python package.

Now, let’s use the int8 kernels that come with Tensorflow-Lite Micro. We do this by using the Tensorflow-Lite Micro Python Wrapper that comes with the MLTK. We use the TfliteMicro API to do this.

# Load the TfliteMicroModel instance
tflm_model = TfliteMicro.load_tflite_model(tflite_path)
print(f'Tensorflow-Lite Micro model details:\n{tflm_model.details}')

try:
    # Load the audio sample into the TFLM model instance's input tensor
    tflm_model.input(value=tflite_input_spectrogram)

    # Run inference (which will use the TFLM int8 SW reference kernels)
    tflm_model.invoke()

    # Retrieve the classification results:
    # NOTE: The results has the shape 1x11, hence the [0]
    classification_results = tflm_model.output()[0]
    print(f'Raw classification results: {classification_results}')

    # Find the index of the largest entry in the list
    predicted_class_index = np.argmax(classification_results)
    prediction_confidence = classification_results[predicted_class_index]

    print(f'The model "{mltk_model.name}" using the reference int8 Tensorflow-Lite Micro kernels predict that the audio sample file:\n{audio_sample_path}\nbelongs to the class: "{classes[predicted_class_index]}" with a confidence of {prediction_confidence*100:.1f}%')
finally:
    # We MUST unload the model after we're done with it
    TfliteMicro.unload_model(tflm_model)
Tensorflow-Lite Micro model details:
Name: keyword_spotting_numbers
Version: 1
Date: 2023-08-03T01:11:43.378Z
Description: 
Hash: 4b22adb625a3300fdcf06fa61105782f
Accelerator: none
Classes: zero, one, two, three, four, five, six, seven, eight, nine, _unknown_
Total runtime memory: 73.192 kBytes

Raw classification results: [0.         0.         0.         0.         0.         0.
 0.         0.99609375 0.         0.         0.        ]
The model "keyword_spotting_numbers" using the reference int8 Tensorflow-Lite Micro kernels predict that the audio sample file:
C:/Users/dried/.mltk/datasets/ten_digits/seven/aws_ar-AE+Hala+seven+medium+medium+1209d48a.wav
belongs to the class: "seven" with a confidence of 99.6%