Audio Feature Generator Example¶
This demonstrates how to:
Load a quantized keyword spotting model
Manually invoke the Audio Feature Generator APIs to generate a spectrogram from an audio file
Run inference with the manually processed audio sample using Tensorflow-Lite and Tensorflow-Lite Micro
In this example, we use the keyword_spotting_numbers ML model.
NOTES:
Click here:
to run this example interactively in your browser
Refer to the Notebook Examples Guide for how to run this example locally in VSCode
Install the MLTK python package¶
# Install the MLTK Python package (if necessary)
!pip install --upgrade silabs-mltk
Import the Python packages¶
import os
import pprint
import numpy as np
from mltk.datasets import audio as audio_datasets
from mltk.core.preprocess.utils import audio as audio_utils
from mltk.core.preprocess.audio.audio_feature_generator import AudioFeatureGeneratorSettings
from mltk.core import load_mltk_model, TfliteModel, TfliteModelParameters
from mltk.core.tflite_micro import TfliteMicro
from mltk.utils.archive_downloader import download_url
Load Audio Sample¶
First we need to obtain an audio sample. In this example, we load a random sample from the ten_digits dataset which was used to train the keyword_spotting_numbers ML model.
# Download the "ten digits" dataset (if necessary)
dataset_dir = audio_datasets.ten_digits.download()
# And grab the first sample in the 'seven' directory
audio_sample_dir = f'{dataset_dir}/seven'
audio_sample_fn = list(os.listdir(audio_sample_dir))[0]
audio_sample_path = f'{audio_sample_dir}/{audio_sample_fn}'
print(f'Using audio sample: {audio_sample_path}')
# Load the audio file into memory
audio_sample_data, audio_sample_rate_hz = audio_utils.read_audio_file(audio_sample_path, return_numpy=True, return_sample_rate=True)
audio_sample_length = len(audio_sample_data)
audio_sample_length_seconds = audio_sample_length/audio_sample_rate_hz
print(f'Sample length: {audio_sample_length_seconds:.1f}s, rate: {audio_sample_rate_hz/1000:.1f}kHz')
Using audio sample: C:/Users/dried/.mltk/datasets/ten_digits/seven/aws_ar-AE+Hala+seven+medium+medium+1209d48a.wav
Sample length: 0.7s, rate: 16.0kHz
Load the MLTK Model¶
Next, we load the MLTK model. We also print a list of the “classes” supported by the model with their corresponding list indices.
# Download the "keyword_spotting_numbers" model archive from github
mltk_model_archive_path = download_url(
url='https://github.com/SiliconLabs/mltk/raw/master/mltk/models/siliconlabs/keyword_spotting_numbers.mltk.zip',
dst_path='keyword_spotting_numbers.mltk.zip',
show_progress=True
)
# Load the MLTK model
mltk_model = load_mltk_model(mltk_model_archive_path)
# Retrieve the classes used by the model
classes = mltk_model.classes
# Print classes and their corresponding indices
print(f'Model: {mltk_model.name} classifies {mltk_model.n_classes} classes with the following mapping:')
for class_index, class_label in enumerate(classes):
print(f'{class_index:2d} -> {class_label}')
Downloading https://github.com/SiliconLabs/mltk/raw/master/mltk/models/siliconlabs/keyword_spotting_numbers.mltk.zip
to c:/Users/dried/silabs/github_siliconlabs/mltk/mltk/examples/keyword_spotting_numbers.mltk.zip
(This may take awhile, please be patient ...)
Extracting keyword_spotting_numbers.py from C:/Users/dried/silabs/github_siliconlabs/mltk/mltk/examples/keyword_spotting_numbers.mltk.zip
Model: keyword_spotting_numbers classifies 11 classes with the following mapping:
0 -> zero
1 -> one
2 -> two
3 -> three
4 -> four
5 -> five
6 -> six
7 -> seven
8 -> eight
9 -> nine
10 -> _unknown_
Load the .tflite model¶
Next, we load the trained and quantized keyword_spotting_numbers .tflite
model. We do this by extracting the .tflite
from the keyword_spotting_numbers.mltk.zip
model archive and loading it into a TfliteModel instance.
# Get the file path to the .tflite in the keyword_spotting_numbers.mltk.zip model archive
tflite_path = mltk_model.tflite_archive_path
print(f'.tflite path: {tflite_path}')
# Load the .tflite file into a TfliteModel instance
tflite_model = TfliteModel.load_flatbuffer_file(tflite_path)
# Generate a summary of the model
print(tflite_model.summary())
.tflite path: E:/dried/mltk/models/keyword_spotting_numbers/extracted_archive/keyword_spotting_numbers.tflite
+-------+------------------------------+-------------------+-----------------+------------------------------------------------------+
| Index | OpCode | Input(s) | Output(s) | Config |
+-------+------------------------------+-------------------+-----------------+------------------------------------------------------+
| 0 | quantize | 98x1x40 (float32) | 98x1x40 (int8) | Type=none |
| 1 | conv_2d | 98x1x40 (int8) | 98x1x40 (int8) | Padding:Same stride:1x1 activation:None |
| | | 3x1x40 (int8) | | |
| | | 40 (int32) | | |
| 2 | conv_2d | 98x1x40 (int8) | 98x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 3 | depthwise_conv_2d | 98x1x120 (int8) | 49x1x120 (int8) | Multiplier:1 padding:Same stride:2x2 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 4 | conv_2d | 49x1x120 (int8) | 49x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 5 | conv_2d | 98x1x40 (int8) | 49x1x40 (int8) | Padding:Same stride:2x2 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 40 (int32) | | |
| 6 | add | 49x1x40 (int8) | 49x1x40 (int8) | Activation:Relu |
| | | 49x1x40 (int8) | | |
| 7 | conv_2d | 49x1x40 (int8) | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 8 | depthwise_conv_2d | 49x1x120 (int8) | 49x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 9 | conv_2d | 49x1x120 (int8) | 49x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 10 | add | 49x1x40 (int8) | 49x1x40 (int8) | Activation:Relu |
| | | 49x1x40 (int8) | | |
| 11 | conv_2d | 49x1x40 (int8) | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 12 | depthwise_conv_2d | 49x1x120 (int8) | 49x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 13 | conv_2d | 49x1x120 (int8) | 49x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 14 | add | 49x1x40 (int8) | 49x1x40 (int8) | Activation:Relu |
| | | 49x1x40 (int8) | | |
| 15 | conv_2d | 49x1x40 (int8) | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 16 | depthwise_conv_2d | 49x1x120 (int8) | 49x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 17 | conv_2d | 49x1x120 (int8) | 49x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 18 | add | 49x1x40 (int8) | 49x1x40 (int8) | Activation:Relu |
| | | 49x1x40 (int8) | | |
| 19 | conv_2d | 49x1x40 (int8) | 49x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 20 | depthwise_conv_2d | 49x1x120 (int8) | 25x1x120 (int8) | Multiplier:1 padding:Same stride:2x2 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 21 | conv_2d | 25x1x120 (int8) | 25x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 22 | conv_2d | 49x1x40 (int8) | 25x1x40 (int8) | Padding:Same stride:2x2 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 40 (int32) | | |
| 23 | add | 25x1x40 (int8) | 25x1x40 (int8) | Activation:Relu |
| | | 25x1x40 (int8) | | |
| 24 | conv_2d | 25x1x40 (int8) | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 25 | depthwise_conv_2d | 25x1x120 (int8) | 25x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 26 | conv_2d | 25x1x120 (int8) | 25x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 27 | add | 25x1x40 (int8) | 25x1x40 (int8) | Activation:Relu |
| | | 25x1x40 (int8) | | |
| 28 | conv_2d | 25x1x40 (int8) | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 29 | depthwise_conv_2d | 25x1x120 (int8) | 25x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 30 | conv_2d | 25x1x120 (int8) | 25x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 31 | add | 25x1x40 (int8) | 25x1x40 (int8) | Activation:Relu |
| | | 25x1x40 (int8) | | |
| 32 | conv_2d | 25x1x40 (int8) | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 33 | depthwise_conv_2d | 25x1x120 (int8) | 25x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 34 | conv_2d | 25x1x120 (int8) | 25x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 35 | add | 25x1x40 (int8) | 25x1x40 (int8) | Activation:Relu |
| | | 25x1x40 (int8) | | |
| 36 | conv_2d | 25x1x40 (int8) | 25x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 37 | depthwise_conv_2d | 25x1x120 (int8) | 13x1x120 (int8) | Multiplier:1 padding:Same stride:2x2 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 38 | conv_2d | 13x1x120 (int8) | 13x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 39 | conv_2d | 25x1x40 (int8) | 13x1x40 (int8) | Padding:Same stride:2x2 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 40 (int32) | | |
| 40 | add | 13x1x40 (int8) | 13x1x40 (int8) | Activation:Relu |
| | | 13x1x40 (int8) | | |
| 41 | conv_2d | 13x1x40 (int8) | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 42 | depthwise_conv_2d | 13x1x120 (int8) | 13x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 43 | conv_2d | 13x1x120 (int8) | 13x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 44 | add | 13x1x40 (int8) | 13x1x40 (int8) | Activation:Relu |
| | | 13x1x40 (int8) | | |
| 45 | conv_2d | 13x1x40 (int8) | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 46 | depthwise_conv_2d | 13x1x120 (int8) | 13x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 47 | conv_2d | 13x1x120 (int8) | 13x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 48 | add | 13x1x40 (int8) | 13x1x40 (int8) | Activation:Relu |
| | | 13x1x40 (int8) | | |
| 49 | conv_2d | 13x1x40 (int8) | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 50 | depthwise_conv_2d | 13x1x120 (int8) | 13x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 51 | conv_2d | 13x1x120 (int8) | 13x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 52 | add | 13x1x40 (int8) | 13x1x40 (int8) | Activation:Relu |
| | | 13x1x40 (int8) | | |
| 53 | conv_2d | 13x1x40 (int8) | 13x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 54 | depthwise_conv_2d | 13x1x120 (int8) | 7x1x120 (int8) | Multiplier:1 padding:Same stride:2x2 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 55 | conv_2d | 7x1x120 (int8) | 7x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 56 | conv_2d | 13x1x40 (int8) | 7x1x40 (int8) | Padding:Same stride:2x2 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 40 (int32) | | |
| 57 | add | 7x1x40 (int8) | 7x1x40 (int8) | Activation:Relu |
| | | 7x1x40 (int8) | | |
| 58 | conv_2d | 7x1x40 (int8) | 7x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 59 | depthwise_conv_2d | 7x1x120 (int8) | 7x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 60 | conv_2d | 7x1x120 (int8) | 7x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 61 | add | 7x1x40 (int8) | 7x1x40 (int8) | Activation:Relu |
| | | 7x1x40 (int8) | | |
| 62 | conv_2d | 7x1x40 (int8) | 7x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 63 | depthwise_conv_2d | 7x1x120 (int8) | 7x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 64 | conv_2d | 7x1x120 (int8) | 7x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 65 | add | 7x1x40 (int8) | 7x1x40 (int8) | Activation:Relu |
| | | 7x1x40 (int8) | | |
| 66 | conv_2d | 7x1x40 (int8) | 7x1x120 (int8) | Padding:Valid stride:1x1 activation:Relu |
| | | 1x1x40 (int8) | | |
| | | 120 (int32) | | |
| 67 | depthwise_conv_2d | 7x1x120 (int8) | 7x1x120 (int8) | Multiplier:1 padding:Same stride:1x1 activation:Relu |
| | | 9x1x120 (int8) | | |
| | | 120 (int32) | | |
| 68 | conv_2d | 7x1x120 (int8) | 7x1x40 (int8) | Padding:Valid stride:1x1 activation:None |
| | | 1x1x120 (int8) | | |
| | | 40 (int32) | | |
| 69 | add | 7x1x40 (int8) | 7x1x40 (int8) | Activation:Relu |
| | | 7x1x40 (int8) | | |
| 70 | reshape | 7x1x40 (int8) | 7x40x1 (int8) | Type=none |
| | | 4 (int32) | | |
| 71 | transpose | 7x40x1 (int8) | 40x1x7 (int8) | Type=none |
| | | 4 (int32) | | |
| 72 | mean | 40x1x7 (int8) | 7 (int8) | Type=reduceroptions |
| | | 3 (int32) | | |
| 73 | squared_difference | 40x1x7 (int8) | 40x1x7 (int8) | Type=none |
| | | 7 (int8) | | |
| 74 | mean | 40x1x7 (int8) | 7 (int8) | Type=reduceroptions |
| | | 3 (int32) | | |
| 75 | add | 7 (int8) | 7 (int8) | Activation:None |
| | | (int8) | | |
| 76 | rsqrt | 7 (int8) | 7 (int8) | Type=none |
| 77 | mul | 40x1x7 (int8) | 40x1x7 (int8) | Activation:None |
| | | 7 (int8) | | |
| 78 | mul | 7 (int8) | 7 (int8) | Activation:None |
| | | 7 (int8) | | |
| 79 | sub | 7 (int8) | 7 (int8) | Type=suboptions |
| | | 7 (int8) | | |
| 80 | add | 40x1x7 (int8) | 40x1x7 (int8) | Activation:None |
| | | 7 (int8) | | |
| 81 | transpose | 40x1x7 (int8) | 7x40x1 (int8) | Type=none |
| | | 4 (int32) | | |
| 82 | reshape | 7x40x1 (int8) | 7x40 (int8) | Type=none |
| | | 3 (int32) | | |
| 83 | mul | 7x40 (int8) | 7x40 (int8) | Activation:None |
| | | 40 (int8) | | |
| 84 | add | 7x40 (int8) | 7x40 (int8) | Activation:None |
| | | 40 (int8) | | |
| 85 | unidirectional_sequence_lstm | 7x40 (int8) | 7x40 (int8) | Time major:False, Activation:Tanh, Cell clip:10.0 |
| | | 40 (int8) | | |
| | | 40 (int8) | | |
| | | 40 (int8) | | |
| | | 40 (int8) | | |
| | | 40 (int8) | | |
| | | 40 (int8) | | |
| | | 40 (int8) | | |
| | | 40 (int8) | | |
| | | 40 (int32) | | |
| | | 40 (int32) | | |
| | | 40 (int32) | | |
| | | 40 (int32) | | |
| | | 40 (int8) | | |
| | | 40 (int16) | | |
| 86 | reshape | 7x40 (int8) | 7x40x1 (int8) | Type=none |
| | | 4 (int32) | | |
| 87 | transpose | 7x40x1 (int8) | 40x1x7 (int8) | Type=none |
| | | 4 (int32) | | |
| 88 | mean | 40x1x7 (int8) | 7 (int8) | Type=reduceroptions |
| | | 3 (int32) | | |
| 89 | squared_difference | 40x1x7 (int8) | 40x1x7 (int8) | Type=none |
| | | 7 (int8) | | |
| 90 | mean | 40x1x7 (int8) | 7 (int8) | Type=reduceroptions |
| | | 3 (int32) | | |
| 91 | add | 7 (int8) | 7 (int8) | Activation:None |
| | | (int8) | | |
| 92 | rsqrt | 7 (int8) | 7 (int8) | Type=none |
| 93 | mul | 40x1x7 (int8) | 40x1x7 (int8) | Activation:None |
| | | 7 (int8) | | |
| 94 | mul | 7 (int8) | 7 (int8) | Activation:None |
| | | 7 (int8) | | |
| 95 | sub | 7 (int8) | 7 (int8) | Type=suboptions |
| | | 7 (int8) | | |
| 96 | add | 40x1x7 (int8) | 40x1x7 (int8) | Activation:None |
| | | 7 (int8) | | |
| 97 | transpose | 40x1x7 (int8) | 7x40x1 (int8) | Type=none |
| | | 4 (int32) | | |
| 98 | reshape | 7x40x1 (int8) | 7x40 (int8) | Type=none |
| | | 3 (int32) | | |
| 99 | mul | 7x40 (int8) | 7x40 (int8) | Activation:None |
| | | 40 (int8) | | |
| 100 | add | 7x40 (int8) | 7x40 (int8) | Activation:None |
| | | 40 (int8) | | |
| 101 | strided_slice | 7x40 (int8) | 40 (int8) | Type=stridedsliceoptions |
| | | 3 (int32) | | |
| | | 3 (int32) | | |
| | | 3 (int32) | | |
| 102 | fully_connected | 40 (int8) | 11 (int8) | Activation:None |
| | | 40 (int8) | | |
| | | 11 (int32) | | |
| 103 | reshape | 11 (int8) | 11x1x1 (int8) | Type=none |
| | | 4 (int32) | | |
| 104 | mean | 11x1x1 (int8) | 1 (int8) | Type=reduceroptions |
| | | 3 (int32) | | |
| 105 | squared_difference | 11x1x1 (int8) | 11x1x1 (int8) | Type=none |
| | | 1 (int8) | | |
| 106 | mean | 11x1x1 (int8) | 1 (int8) | Type=reduceroptions |
| | | 3 (int32) | | |
| 107 | add | 1 (int8) | 1 (int8) | Activation:None |
| | | (int8) | | |
| 108 | rsqrt | 1 (int8) | 1 (int8) | Type=none |
| 109 | mul | 11x1x1 (int8) | 11x1x1 (int8) | Activation:None |
| | | 1 (int8) | | |
| 110 | mul | 1 (int8) | 1 (int8) | Activation:None |
| | | 1 (int8) | | |
| 111 | sub | 1 (int8) | 1 (int8) | Type=suboptions |
| | | 1 (int8) | | |
| 112 | add | 11x1x1 (int8) | 11x1x1 (int8) | Activation:None |
| | | 1 (int8) | | |
| 113 | reshape | 11x1x1 (int8) | 11 (int8) | Type=none |
| | | 2 (int32) | | |
| 114 | mul | 11 (int8) | 11 (int8) | Activation:None |
| | | 11 (int8) | | |
| 115 | add | 11 (int8) | 11 (int8) | Activation:None |
| | | 11 (int8) | | |
| 116 | softmax | 11 (int8) | 11 (int8) | Type=softmaxoptions |
| 117 | dequantize | 11 (int8) | 11 (float32) | Type=none |
+-------+------------------------------+-------------------+-----------------+------------------------------------------------------+
Process the audio sample in the AudioFeatureGenerator¶
Next, we process the audio sample in the Audio Feature Generator. This will convert the raw audio into a spectrogram image which can be given to the .tflite
model for classification.
To process the audio sample, we must use the AudioFeatureGenerator settings embedded into the .tflite
. These are the settings that were used to train the model and also the settings used by the embedded device at runtime.
# Retrieve the AudioFeatureGenerator settings from the .tflite
tflite_params = TfliteModelParameters.load_from_tflite_file(tflite_path)
# Load the .tflite parameters into a AudioFeatureGeneratorSettings instance
tflite_frontend_settings = AudioFeatureGeneratorSettings(**tflite_params)
print(f'Audio frontend settings:\n{pprint.pformat(tflite_frontend_settings)}')
# Adjust the audio sample so that it is the correct length expected by the audio frontend settings
frontend_sample_length = int((audio_sample_rate_hz * tflite_frontend_settings.sample_length_ms) / 1000)
adjusted_audio_sample_data = audio_utils.adjust_length(
audio_sample_data,
out_length=frontend_sample_length,
trim_threshold_db=30,
offset=0
)
# Process the length-adjusted audio in the audio frontend (aka AudioFeatureGenerator).
# This will generate a spectrogram from the raw audio using the settings embedded into the .tflite
spectrogram = audio_utils.apply_frontend(
sample=adjusted_audio_sample_data,
settings=tflite_frontend_settings,
dtype=np.uint16 # We just want the raw, uint16 output of the generated spectrogram
)
print(f'Generated spectrogram shape: {"x".join(map(str, spectrogram.shape))} ({spectrogram.dtype})')
# The generated spectrogram is uint16.
# However, the keyword_spotting_numbers model expects a normalized, float32 input.
# So, we use numpy to normalize the input sample
# norm_spectrogram = (spectrogram - mean(spectrogram)) / std(spectrogram)
norm_spectrogram = spectrogram.astype(np.float32)
norm_spectrogram -= np.mean(norm_spectrogram, dtype=np.float32, keepdims=False)
norm_spectrogram /= (np.std(norm_spectrogram, dtype=np.float32, keepdims=False) + 1e-6)
print(f'Normalized spectrogram shape: {"x".join(map(str, norm_spectrogram.shape))} ({norm_spectrogram.dtype})')
# The keyword_spotting_numbers model also expects the input shape to be:
# <time, 1, features>
# So, we insert an extra dimension:
tflite_input_spectrogram = np.expand_dims(norm_spectrogram, axis=-2)
print(f'.tflite input spectrogram shape: {"x".join(map(str, tflite_input_spectrogram.shape))} ({tflite_input_spectrogram.dtype})')
Audio frontend settings:
{'average_window_duration_ms': 450,
'classes': ['zero',
'one',
'two',
'three',
'four',
'five',
'six',
'seven',
'eight',
'nine',
'_unknown_'],
'date': '2023-08-03T01:11:43.378Z',
'detection_threshold': 242,
'fe.activity_detection_alpha_a': 0.5,
'fe.activity_detection_alpha_b': 0.800000011920929,
'fe.activity_detection_arm_threshold': 0.75,
'fe.activity_detection_enable': False,
'fe.activity_detection_trip_threshold': 0.800000011920929,
'fe.dc_notch_filter_coefficient': 0.949999988079071,
'fe.dc_notch_filter_enable': True,
'fe.fft_length': 512,
'fe.filterbank_lower_band_limit': 125.0,
'fe.filterbank_n_channels': 40,
'fe.filterbank_upper_band_limit': 7500.0,
'fe.log_scale_enable': True,
'fe.log_scale_shift': 6,
'fe.noise_reduction_enable': True,
'fe.noise_reduction_even_smoothing': 0.02500000037252903,
'fe.noise_reduction_min_signal_remaining': 0.4000000059604645,
'fe.noise_reduction_odd_smoothing': 0.05999999865889549,
'fe.noise_reduction_smoothing_bits': 10,
'fe.pcan_enable': False,
'fe.pcan_gain_bits': 21,
'fe.pcan_offset': 80.0,
'fe.pcan_strength': 0.949999988079071,
'fe.quantize_dynamic_scale_enable': False,
'fe.quantize_dynamic_scale_range_db': 40.0,
'fe.sample_length_ms': 1000,
'fe.sample_rate_hz': 16000,
'fe.window_size_ms': 30,
'fe.window_step_ms': 10,
'hash': '4b22adb625a3300fdcf06fa61105782f',
'latency_ms': 10,
'minimum_count': 2,
'name': 'keyword_spotting_numbers',
'runtime_memory_size': 80500,
'samplewise_norm.mean_and_std': True,
'suppression_ms': 700,
'verbose_model_output_logs': False,
'version': 1,
'volume_gain': 0.0}
Generated spectrogram shape: 98x40 (uint16)
Normalized spectrogram shape: 98x40 (float32)
.tflite input spectrogram shape: 98x1x40 (float32)
Classify the audio sample using TF-Lite¶
Next, we give the processed audio sample to the .tflite
model instance which will classify the audio.
The model output is a list of probabilities. The list entry with the largest probability is the “class” to which the model thinks the audio sample belongs.
NOTE: This uses the default int8
“kernels” that come with TF-Lite
# Give the processed audio sample (which is now a normalized spectrogram)
# to the trained and quantized keyword_spotting_numbers.tflite model,
# which will classify the sample and return the classification results
classification_results = tflite_model.predict(tflite_input_spectrogram)
print(f'Raw classification results: {classification_results}')
# Find the index of the largest entry in the list
predicted_class_index = np.argmax(classification_results)
prediction_confidence = classification_results[predicted_class_index]
print(f'The model "{mltk_model.name}" using the reference int8 Tensorflow-Lite kernels predict that the audio sample file:\n{audio_sample_path}\nbelongs to the class: "{classes[predicted_class_index]}" with a confidence of {prediction_confidence*100:.1f}%')
Raw classification results: [0. 0. 0. 0. 0. 0.
0. 0.99609375 0. 0. 0. ]
The model "keyword_spotting_numbers" using the reference int8 Tensorflow-Lite kernels predict that the audio sample file:
C:/Users/dried/.mltk/datasets/ten_digits/seven/aws_ar-AE+Hala+seven+medium+medium+1209d48a.wav
belongs to the class: "seven" with a confidence of 99.6%
Classify the audio using TF-Lite Micro¶
Before, we used the int8
Tensorflow-Lite kernels that come with the Tensorflow Python package.
Now, let’s use the int8
kernels that come with Tensorflow-Lite Micro. We do this by using the Tensorflow-Lite Micro Python Wrapper that comes with the MLTK. We use the TfliteMicro API to do this.
# Load the TfliteMicroModel instance
tflm_model = TfliteMicro.load_tflite_model(tflite_path)
print(f'Tensorflow-Lite Micro model details:\n{tflm_model.details}')
try:
# Load the audio sample into the TFLM model instance's input tensor
tflm_model.input(value=tflite_input_spectrogram)
# Run inference (which will use the TFLM int8 SW reference kernels)
tflm_model.invoke()
# Retrieve the classification results:
# NOTE: The results has the shape 1x11, hence the [0]
classification_results = tflm_model.output()[0]
print(f'Raw classification results: {classification_results}')
# Find the index of the largest entry in the list
predicted_class_index = np.argmax(classification_results)
prediction_confidence = classification_results[predicted_class_index]
print(f'The model "{mltk_model.name}" using the reference int8 Tensorflow-Lite Micro kernels predict that the audio sample file:\n{audio_sample_path}\nbelongs to the class: "{classes[predicted_class_index]}" with a confidence of {prediction_confidence*100:.1f}%')
finally:
# We MUST unload the model after we're done with it
TfliteMicro.unload_model(tflm_model)
Tensorflow-Lite Micro model details:
Name: keyword_spotting_numbers
Version: 1
Date: 2023-08-03T01:11:43.378Z
Description:
Hash: 4b22adb625a3300fdcf06fa61105782f
Accelerator: none
Classes: zero, one, two, three, four, five, six, seven, eight, nine, _unknown_
Total runtime memory: 73.192 kBytes
Raw classification results: [0. 0. 0. 0. 0. 0.
0. 0.99609375 0. 0. 0. ]
The model "keyword_spotting_numbers" using the reference int8 Tensorflow-Lite Micro kernels predict that the audio sample file:
C:/Users/dried/.mltk/datasets/ten_digits/seven/aws_ar-AE+Hala+seven+medium+medium+1209d48a.wav
belongs to the class: "seven" with a confidence of 99.6%