Audio Classifier¶
This application uses TensorFlow Lite for Microcontrollers to run audio classification machine learning models to classify words from audio data recorded from a microphone. The detection is visualized using the LED’s on the board and the classification results are written to the VCOM serial port.
NOTES:
This application is able to be built for Windows/Linux or a supported embedded target.
This application is designed to be used with the MLTK command: mltk classify_audio
Quick Links¶
GitHub Source - View this example’s source code on Github
Online documentation - View this example’s online documentation
Keyword Spotting Tutorial - View this example’s associated tutorial
Audio classifier utility - View the audio classifier utility’s documentation
Alexa Demo - Alexa Voice Services (AVS) demo using a Python script
Behavior¶
The application is using two LEDs to show detection and activity and it is printing detection results and debug log output on the VCOM serial port. In the application configuration file called audio_classifier_config.h the user can select which LED to use for activity and which LED to use for detection. By default the detection LED is green/led1 and the activity LED is red/led0.
At a regular interval the application will perform an inference and the result will be processed to find the average score for each class in the current window. If the top result score is higher than a detection threshold then a detection is triggered and the detection LED (green) will light up for about 750 ms.
Once the detection LED turns off the application goes back to responding to the input data. If the change in model output is greater than a configurable sensitivity threshold, then the activity LED (red) will blink for about 500 ms.
The activity LED indicates that audio has been detected on the input and the model output is changing, but no clear classification was made.
In audio classification, it is common to have some results that map to silence or unknown. These results are something that we usually want to ignore. This is being filtered out in the audio classifier application based on the label text. By default, any labels that start with an underscore are ignored when processing results. This behavior can be disabled in the application configuration file.
Updating the model¶
The default model used in this application is called keyword_spotting_on_off_v3.tflite
and is able to classify audio into 3 different classes labeled “on”, “off”, “unknown”. The source for the model can be found here:
https://github.com/siliconlabs/mltk/blob/master/mltk/models/siliconlabs/keyword_spotting_on_off_v3.py
The application is designed to work with an audio classification model created using the Silicon Labs Machine Learning Toolkit (MLTK). Use the MLTK to train a new audio classifier model and replace the model inside this example with the new audio classification model.
via Simplicity Studio¶
To replace the default model, rename your .tflite
file to
1_<your model named>.tflite
and copy it into the config/tflite folder of the Simplicity Studio
project. (Simplicity Studio sorts the models alphabetically in ascending order, adding 1_
forces the model to come first). After a new .tflite file is added to the
project Simplicity Studio will automatically use the
flatbuffer converter tool
to convert a .tflite file into a c file which is added to the project.
Refer to the online documentation for more details.
via classify_audio Command¶
Alternatively, using the mltk classify_audio <model path> --app none --device
command program the .tflite
model to the end of the device’s flash.
On startup, the application will detect the new model and use that instead
of the model built into the firmware.
NOTE: The --app none
option tells the command to not update the audio_classifier application
and only program the model file.
See the Audio classifier utility documentation for more details.
via CMake¶
The model can also be updated when building this application from Visual Studio Code or the CMake Command Line.
To update the model, create/modify the file: <mltk repo root>/user_options.cmake
and add:
mltk_set(AUDIO_CLASSIFIER_MODEL <model name or path>)
where <model name or path>
is the file path to your model’s .tflite
or the MLTK model name.
With this variable set, when the audio_classifier application is built the specified model will be built into the application.
Build, Run, Debug¶
See the online documentation for how to build and run this application:
Simplicity Studio¶
If using Simplicity Studio select the MLTK - Audio Classifier
Project.
Visual Studio Code¶
If using Visual Studio Code select the mltk_audio_classifier
CMake target.
Command-line¶
If using the Command Line select the mltk_audio_classifier
CMake target.
Dumping audio & spectrograms to PC¶
This application works with the MLTK command:
mltk classify_audio --help
Using this command, you can dump spectrograms and recorded audio to the local PC.
For example:
# Dump spectrograms generated on the embedded device to the local PC
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-spectrograms
# Dump audio recorded by the embedded device to the local PC
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-audio
See the Audio classifier utility documentation for more details.
Model Parameters¶
In order for the audio classification to work correctly, we need to use the same
audio feature generator configuration parameters for inference as is used when
training the model. When using the MLTK to train an audio classification model
the model parameters
will be embedded in the metadata section of the .tflite
file. The model parameters are extracted from the .tflite
at runtime.
Modifications¶
The application was originally taken from the Gecko SDK.
It has been modified as follows:
Supports running on embedded as well as Windows/Linux
All relevant
#defines
have been converted to dynamic variables that are populated via command line (for Windows/Linux) or from the.tflite
model parameters.tflite
can been dynamically loaded via command line (for Windows/Linux) or from a.tflite
programmed to the end of the embedded target’s flashAdded support for dumping raw microphone audio and generated spectrograms for capture by Python script (using the
mltk classify_audio
command)Updated embedded microphone driver to support dynamic sample lengths
CMake Variables¶
This application supports the following optional CMake variables. The variables may be specified on the command-line or in the user_options.cmake file.
VERBOSE¶
Enable verbose logging while the application executes
mltk_set(VERBOSE ON)
WINDOW_MS¶
Configure the length of the averaging window in milliseconds.
This overrides the model parameter setting average_window_duration_ms
.
mltk_set(WINDOW_MS 750)
THRESHOLD¶
Configure the detection threshold. This is a value from 0-255., 255 being the highest.
This overrides the model parameter setting detection_threshold
.
mltk_set(THRESHOLD 185)
SUPPRESSION_MS¶
The amount of time in milliseconds to wait after a detection to begin listening for keywords again.
This overrides the model parameter setting suppression_ms
.
mltk_set(SUPPRESSION_MS 500)
COUNT¶
The minimum number of inference results to average when calculating the detection value
This overrides the model parameter setting minimum_count
.
mltk_set(COUNT 2)
VOLUME_GAIN¶
The integer multiplier value to apply to each microphone sample.
This overrides the model parameter setting volume_gain
.
mltk_set(VOLUME_GAIN 2)
LATENCY_MS¶
This is the amount of time in milliseconds an audio loop takes.
This overrides the model parameter setting latency_ms
.
mltk_set(LATENCY_MS 2)
AUDIO_CLASSIFIER_ENABLE_AUDIO_IO¶
This enables audio input/output streaming via UART. This is currently only used in the Keyword Spotting - Alexa demo.
mltk_set(AUDIO_CLASSIFIER_ENABLE_AUDIO_IO ON)
NOTE: When this feature is enabled, log prints are effectively disabled.