Audio Classifier

This application uses TensorFlow Lite for Microcontrollers to run audio classification machine learning models to classify words from audio data recorded from a microphone. The detection is visualized using the LED’s on the board and the classification results are written to the VCOM serial port.

NOTES:

  • This application is able to be built for Windows/Linux or a supported embedded target.

  • This application is designed to be used with the MLTK command: mltk classify_audio

Behavior

The application is using two LEDs to show detection and activity and it is printing detection results and debug log output on the VCOM serial port. In the application configuration file called audio_classifier_config.h the user can select which LED to use for activity and which LED to use for detection. By default the detection LED is green/led1 and the activity LED is red/led0.

At a regular interval the application will perform an inference and the result will be processed to find the average score for each class in the current window. If the top result score is higher than a detection threshold then a detection is triggered and the detection LED (green) will light up for about 750 ms.

Once the detection LED turns off the application goes back to responding to the input data. If the change in model output is greater than a configurable sensitivity threshold, then the activity LED (red) will blink for about 500 ms.

The activity LED indicates that audio has been detected on the input and the model output is changing, but no clear classification was made.

In audio classification, it is common to have some results that map to silence or unknown. These results are something that we usually want to ignore. This is being filtered out in the audio classifier application based on the label text. By default, any labels that start with an underscore are ignored when processing results. This behavior can be disabled in the application configuration file.

Updating the model

The default model used in this application is called keyword_spotting_on_off_v3.tflite and is able to classify audio into 3 different classes labeled “on”, “off”, “unknown”. The source for the model can be found here: https://github.com/siliconlabs/mltk/blob/master/mltk/models/siliconlabs/keyword_spotting_on_off_v3.py

The application is designed to work with an audio classification model created using the Silicon Labs Machine Learning Toolkit (MLTK). Use the MLTK to train a new audio classifier model and replace the model inside this example with the new audio classification model.

via Simplicity Studio

To replace the default model, rename your .tflite file to 1_<your model named>.tflite and copy it into the config/tflite folder of the Simplicity Studio project. (Simplicity Studio sorts the models alphabetically in ascending order, adding 1_ forces the model to come first). After a new .tflite file is added to the project Simplicity Studio will automatically use the flatbuffer converter tool to convert a .tflite file into a c file which is added to the project.

Refer to the online documentation for more details.

via classify_audio Command

Alternatively, using the mltk classify_audio <model path> --app none --device command program the .tflite model to the end of the device’s flash. On startup, the application will detect the new model and use that instead of the model built into the firmware.

NOTE: The --app none option tells the command to not update the audio_classifier application and only program the model file.

See the Audio classifier utility documentation for more details.

via CMake

The model can also be updated when building this application from Visual Studio Code or the CMake Command Line.

To update the model, create/modify the file: <mltk repo root>/user_options.cmake and add:

mltk_set(AUDIO_CLASSIFIER_MODEL <model name or path>)

where <model name or path> is the file path to your model’s .tflite or the MLTK model name.

With this variable set, when the audio_classifier application is built the specified model will be built into the application.

Build, Run, Debug

See the online documentation for how to build and run this application:

Simplicity Studio

If using Simplicity Studio select the MLTK - Audio Classifier Project.

Visual Studio Code

If using Visual Studio Code select the mltk_audio_classifier CMake target.

Command-line

If using the Command Line select the mltk_audio_classifier CMake target.

Dumping audio & spectrograms to PC

This application works with the MLTK command:

mltk classify_audio --help

Using this command, you can dump spectrograms and recorded audio to the local PC.

For example:

# Dump spectrograms generated on the embedded device to the local PC
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-spectrograms
# Dump audio recorded by the embedded device to the local PC
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-audio

See the Audio classifier utility documentation for more details.

Model Parameters

In order for the audio classification to work correctly, we need to use the same audio feature generator configuration parameters for inference as is used when training the model. When using the MLTK to train an audio classification model the model parameters will be embedded in the metadata section of the .tflite file. The model parameters are extracted from the .tflite at runtime.

Modifications

The application was originally taken from the Gecko SDK.

It has been modified as follows:

  1. Supports running on embedded as well as Windows/Linux

  2. All relevant #defines have been converted to dynamic variables that are populated via command line (for Windows/Linux) or from the .tflite model parameters

  3. .tflite can been dynamically loaded via command line (for Windows/Linux) or from a .tflite programmed to the end of the embedded target’s flash

  4. Added support for dumping raw microphone audio and generated spectrograms for capture by Python script (using the mltk classify_audio command)

  5. Updated embedded microphone driver to support dynamic sample lengths

CMake Variables

This application supports the following optional CMake variables. The variables may be specified on the command-line or in the user_options.cmake file.

VERBOSE

Enable verbose logging while the application executes

mltk_set(VERBOSE ON)

WINDOW_MS

Configure the length of the averaging window in milliseconds. This overrides the model parameter setting average_window_duration_ms.

mltk_set(WINDOW_MS 750)

THRESHOLD

Configure the detection threshold. This is a value from 0-255., 255 being the highest. This overrides the model parameter setting detection_threshold.

mltk_set(THRESHOLD 185)

SUPPRESSION_MS

The amount of time in milliseconds to wait after a detection to begin listening for keywords again. This overrides the model parameter setting suppression_ms.

mltk_set(SUPPRESSION_MS 500)

COUNT

The minimum number of inference results to average when calculating the detection value This overrides the model parameter setting minimum_count.

mltk_set(COUNT 2)

VOLUME_GAIN

The integer multiplier value to apply to each microphone sample. This overrides the model parameter setting volume_gain.

mltk_set(VOLUME_GAIN 2)

LATENCY_MS

This is the amount of time in milliseconds an audio loop takes. This overrides the model parameter setting latency_ms.

mltk_set(LATENCY_MS 2)

AUDIO_CLASSIFIER_ENABLE_AUDIO_IO

This enables audio input/output streaming via UART. This is currently only used in the Keyword Spotting - Alexa demo.

mltk_set(AUDIO_CLASSIFIER_ENABLE_AUDIO_IO ON)

NOTE: When this feature is enabled, log prints are effectively disabled.

Additional Reading