# Audio Feature Generator The AudioFeatureGenerator is a software library to convert streaming audio into spectrograms. The spectrograms are then used by a classification machine learning model to make predictions on the contents of the streaming audio. A common use case of this library is "keyword spotting". Refer to the [Keyword Spotting Overview](./keyword_spotting_overview.md) for more details on how spectrograms are used to detect keywords in streaming audio. Refer to the [Keyword Spotting Tutorial](../../mltk/tutorials/keyword_spotting_on_off) for a complete guide on how to use the MLTK to create an audio classification ML model. ## Overview There are three main parts to the AudioFeatureGenerator: - [Gecko SDK Component](#gecko-sdk-component) - Software library provided by the Gecko SDK and runs on the an embedded target - [MLTK C++ Python Wrapper](#mltk-c-python-wrapper) - Python package that wraps the Gecko SDK software library; this runs on the host PC - [Audio Visualizer Utility](#audio-visualizer-utility) - Graphical utility to view the spectrograms generated by the AudioFeatureGenerator in real-time ```{note} See the [Audio Utilities](./audio_utilities.md) documentation for more details about the audio tools offered by the MLTK ``` These parts work together as follows: 1. The AudioFeatureGenerator visualizer tool is used to select spectrogram settings - The `mltk view_audio` command is used to invoke visualizer tool 2. The spectrogram settings are saved to a [Model Specification](../guides/model_specification.md) file 3. The [Model Specification](../guides/model_specification.md) file is used to train the model - The `mltk train` command is used to train the model - Internally, the [AudioFeatureGenerator](../cpp_development/wrappers/audio_feature_generator_wrapper.md) C++ Python wrapper is used to dynamically generate spectrograms from the audio dataset 4. At the end of training, the MLTK embeds the [spectrogram settings](../guides/model_parameters.md#audiodatasetmixin) into the generated `.tflite` model file 5. The generated `.tflite` model file is copied to a Gecko SDK project 6. The Gecko SDK project generator parses the spectrogram settings embedded in the `.tflite` and generates the corresponding C header files with the settings 7. The Gecko SDK project is built and the firmware image is loaded onto the embedded target. The firmware image contains: - Trained `.tflite` model file - [Tensorflow-Lite Micro](https://github.com/tensorflow/tflite-micro) interpreter - [AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library - AudioFeatureGenerator [settings](../guides/model_parameters.md#audiodatasetmixin) used to train the model 8. On the embedded target at runtime: a. Read streaming audio from the microphone b. The microphone audio is sent to the AudioFeatureGenerator where spectrograms are generated using the _exact_ same settings and algorithms that were used during model training c. The generated spectrogram images are sent to Tensorflow-Lite Micro and are classified using the `.tflite` model d. The model predictions are used to notify the application of keyword detections ## Benefits The benefits of using the AudioFeatureGenerator are: - The _exact_ same algorithms and settings used to generate the spectrograms during model training are also used by the embedded target - This ensures the ML model "sees" the same type of spectrograms at runtime that it was trained to see which should allow for better performance - The [spectrogram settings](../guides/model_parameters.md#audiodatasetmixin) are automatically embedded into the `.tflite` model file - This ensures the settings are in lock-step with the trained model - The ML model designer only needs to distribute a single file - The Gecko SDK will automatically generate the necessary source code - The Gecko SDK will parse the spectrogram settings from the `.tflite` and generate the corresponding C headers - The Gecko SDK comes with the full [source code](https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.0/util/third_party/tensorflow_extra/src/sl_ml_audio_feature_generation.c) to the AudioFeatureGenerator software library ## Gecko SDK Component The [Gecko SDK AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) component is largely based on the [Google Microfrontend](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/experimental/microfrontend/lib) library. > A feature generation library (also called frontend) that receives raw audio input, and produces filter banks (a vector of values). > > The raw audio input is expected to be 16-bit PCM features, with a configurable sample rate. More specifically the audio signal goes through a pre-emphasis filter (optionally); then gets sliced into (potentially overlapping) frames and a window function is applied to each frame; afterwards, we do a Fourier transform on each frame (or more specifically a Short-Time Fourier Transform) and calculate the power spectrum; and subsequently compute the filter banks. ### Source Code The Gecko SDK features an AudioFeatureGeneration component. The MLTK also features the same component with slight modifications so that it can be built for Windows/Linux. - Gecko SDK source code: [sl_ml_audio_feature_generation.c](https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.0/util/third_party/tensorflow_extra/src/sl_ml_audio_feature_generation.c) - MLTK source code: [__mltk__/cpp/shared/gecko_sdk/audio_feature_generation](https://github.com/siliconlabs/mltk/tree/master/cpp//shared/gecko_sdk/audio_feature_generation) ## MLTK C++ Python Wrapper The C++ Python [wrapper](../cpp_development/wrappers/audio_feature_generator_wrapper.md) allows for executing the AudioFeatureGenerator component from a Python script. This allows for executing the [AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library during model training. This is useful because the _exact_ spectrogram generation algorithms used by the embedded device at runtime may also be used during model training which should (hopefully) lead to more accurate model predictions. The MLTK uses [pybind11](https://pybind11.readthedocs.io/en/latest/) to wrap the [AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library and generate a Windows/Linux binary that can be loaded into the Python runtime environment. The AudioFeatureGenerator Python API docs may be found here: [mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator](mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator). ### Source Code - __C++ Python Wrapper__ - [cpp/audio_feature_generator_wrapper](https://github.com/siliconlabs/mltk/tree/master/cpp/audio_feature_generator_wrapper) - __Python API__ - [mltk/core/preprocess/audio/audio_feature_generator](https://github.com/siliconlabs/mltk/tree/master/mltk/core/preprocess/audio/audio_feature_generator) ```{note} When [installing](../installation.md) the MLTK for local development, the C++ wrapper is automatically built into a Windows/Linux shared library (`.dll` / `.so`) and copied to the Python [directory](https://github.com/siliconlabs/mltk/tree/master/mltk/core/preprocess/audio/audio_feature_generator). When the [AudioFeatureGenerator](mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator) Python library is invoked by your Python scripts, the C++ wrapper shared library is loaded into the Python runtime environment. ``` ### Usage The recommended way of using the AudioFeatureGenerator [C++ wrapper](../cpp_development/wrappers/audio_feature_generator_wrapper.md) is by calling the [mltk.core.preprocess.utils.audio.apply_frontend()](https://siliconlabs.github.io/mltk/docs/python_api/data_preprocessing/audio.html#mltk.core.preprocess.utils.audio.apply_frontend) API. Refer to the [keyword_spotting_on_off_v3.py](https://github.com/siliconlabs/mltk/tree/master/mltk/models/siliconlabs/keyword_spotting_on_off_v3.py) [model specification](../guides/model_specification.md) for an example of how this is used. Basically, 1 ) In your [model specification](../guides/model_specification.md) file, define a model object to inherit the [DatasetMixin](mltk.core.DatasetMixin), e.g.: ```python class MyModel( MltkModel, TrainMixin, DatasetMixin, EvaluateClassifierMixin ): pass ``` 2 ) In your [model specification](../guides/model_specification.md) file, configure the spectrogram settings, e.g: ```python frontend_settings = AudioFeatureGeneratorSettings() frontend_settings.sample_rate_hz = 16000 frontend_settings.sample_length_ms = 1000 # A 1s buffer should be enough to capture the keywords frontend_settings.window_size_ms = 30 frontend_settings.window_step_ms = 10 frontend_settings.filterbank_n_channels = 104 # We want this value to be as large as possible # while still allowing for the ML model to execute efficiently on the hardware frontend_settings.filterbank_upper_band_limit = 7500.0 frontend_settings.filterbank_lower_band_limit = 125.0 # The dev board mic seems to have a lot of noise at lower frequencies frontend_settings.noise_reduction_enable = True # Enable the noise reduction block to help ignore background noise in the field frontend_settings.noise_reduction_smoothing_bits = 10 frontend_settings.noise_reduction_even_smoothing = 0.025 frontend_settings.noise_reduction_odd_smoothing = 0.06 frontend_settings.noise_reduction_min_signal_remaining = 0.40 # This value is fairly large (which makes the background noise reduction small) # But it has been found to still give good results # i.e. There is still some background noise reduction, # but the actual signal is still (mostly) untouched frontend_settings.dc_notch_filter_enable = True # Enable the DC notch filter, to help remove the DC signal from the dev board's mic frontend_settings.dc_notch_filter_coefficient = 0.95 frontend_settings.quantize_dynamic_scale_enable = True # Enable dynamic quantization, this dynamically converts the uint16 spectrogram to int8 frontend_settings.quantize_dynamic_scale_range_db = 40.0 # Add the Audio Feature generator settings to the model parameters # This way, they are included in the generated .tflite model file # See https://siliconlabs.github.io/mltk/docs/guides/model_parameters.html my_model.model_parameters.update(frontend_settings) ``` 3 ) Configure the your data pipeline to call the frontend: ```python from mltk.core.preprocess.utils import audio as audio_utils spectrogram = audio_utils.apply_frontend( sample=augmented_sample, settings=frontend_settings, dtype=np.int8 ) ``` During model [training](../guides/model_training.md), spectrograms will be dynamically generated from the dataset's audio samples using the [AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) via [C++ Python wrapper](../cpp_development/wrappers/audio_feature_generator_wrapper.md). At the end of training, the [spectrogram settings](../guides/model_parameters.md#audiodatasetmixin) are automatically embedded into the generated `.tflite` model file. ## Audio Visualizer Utility The Audio Visualizer Utility provides a graphical interface to the [C++ Python wrapper](mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator) and thus [Gecko SDK AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library. It allows for adjusting the various spectrogram settings and seeing how the resulting spectrogram is affected in real-time. To use the Audio Visualizer utility, issue the command: ```shell mltk view_audio ``` __NOTE:__ Internally, this will install the [wxPython](https://www.wxpython.org/) Python package. ![audio_visualizer](../img/audio_visualizer.gif)