# Audio Feature Generator


The AudioFeatureGenerator is a software library to convert streaming audio into spectrograms.
The spectrograms are then used by a classification machine learning model to make predictions on the 
contents of the streaming audio.

A common use case of this library is "keyword spotting".  
Refer to the [Keyword Spotting Overview](./keyword_spotting_overview.md) for more
details on how spectrograms are used to detect keywords in streaming audio.

Refer to the [Keyword Spotting Tutorial](../../mltk/tutorials/keyword_spotting_on_off) for a complete
guide on how to use the MLTK to create an audio classification ML model.


## Overview

There are three main parts to the AudioFeatureGenerator:
- [Gecko SDK Component](#gecko-sdk-component) - Software library provided by the Gecko SDK and runs on the an embedded target
- [MLTK C++ Python Wrapper](#mltk-c-python-wrapper) - Python package that wraps the Gecko SDK software library; this runs on the host PC
- [Audio Visualizer Utility](#audio-visualizer-utility) - Graphical utility to view the spectrograms generated by the AudioFeatureGenerator in real-time

```{note}
See the [Audio Utilities](./audio_utilities.md) documentation for more details about the audio tools offered by the MLTK
```

These parts work together as follows:

1. The AudioFeatureGenerator visualizer tool is used to select spectrogram settings
   - The `mltk view_audio` command is used to invoke visualizer tool
2. The spectrogram settings are saved to a [Model Specification](../guides/model_specification.md) file
3. The [Model Specification](../guides/model_specification.md) file is used to train the model
   - The `mltk train` command  is used to train the model
   - Internally, the [AudioFeatureGenerator](../cpp_development/wrappers/audio_feature_generator_wrapper.md) C++ Python wrapper is used to dynamically generate spectrograms from the audio dataset
4. At the end of training, the MLTK embeds the [spectrogram settings](../guides/model_parameters.md#audiodatasetmixin) into the generated `.tflite` model file
5. The generated `.tflite` model file is copied to a Gecko SDK project
6. The Gecko SDK project generator parses the spectrogram settings embedded in the `.tflite` and generates the corresponding C header files with the settings
7. The Gecko SDK project is built and the firmware image is loaded onto the embedded target. The firmware image contains:
   - Trained `.tflite` model file
   - [Tensorflow-Lite Micro](https://github.com/tensorflow/tflite-micro) interpreter
   - [AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library
   - AudioFeatureGenerator [settings](../guides/model_parameters.md#audiodatasetmixin) used to train the model
8. On the embedded target at runtime:  
   a. Read streaming audio from the microphone  
   b. The microphone audio is sent to the AudioFeatureGenerator where spectrograms are generated using the _exact_ same settings and algorithms that were used during model training  
   c. The generated spectrogram images are sent to Tensorflow-Lite Micro and are classified using the `.tflite` model  
   d. The model predictions are used to notify the application of keyword detections


## Benefits

The benefits of using the AudioFeatureGenerator are:

- The _exact_ same algorithms and settings used to generate the spectrograms during model training are also used by the embedded target
  - This ensures the ML model "sees" the same type of spectrograms at runtime that it was trained to see which should allow for better performance
- The [spectrogram settings](../guides/model_parameters.md#audiodatasetmixin) are automatically embedded into the `.tflite` model file
  - This ensures the settings are in lock-step with the trained model
  - The ML model designer only needs to distribute a single file 
- The Gecko SDK will automatically generate the necessary source code
  - The Gecko SDK will parse the spectrogram settings from the `.tflite` and generate the corresponding C headers
  - The Gecko SDK comes with the full [source code](https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.0/util/third_party/tensorflow_extra/src/sl_ml_audio_feature_generation.c) to the AudioFeatureGenerator software library


## Gecko SDK Component

The [Gecko SDK AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) component is largely based on the [Google Microfrontend](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/experimental/microfrontend/lib) library.

> A feature generation library (also called frontend) that receives raw audio input, and produces filter banks (a vector of values).
> 
> The raw audio input is expected to be 16-bit PCM features, with a configurable sample rate. More specifically the audio signal goes through a pre-emphasis filter (optionally); then gets sliced into (potentially overlapping) frames and a window function is applied to each frame; afterwards, we do a Fourier transform on each frame (or more specifically a Short-Time Fourier Transform) and calculate the power spectrum; and subsequently compute the filter banks.

### Source Code

The Gecko SDK features an AudioFeatureGeneration component.  
The MLTK also features the same component with slight modifications so that it can be built for Windows/Linux.


- Gecko SDK source code: [sl_ml_audio_feature_generation.c](https://github.com/SiliconLabs/gecko_sdk/blob/gsdk_4.0/util/third_party/tensorflow_extra/src/sl_ml_audio_feature_generation.c)
- MLTK source code: [__mltk__/cpp/shared/gecko_sdk/audio_feature_generation](https://github.com/siliconlabs/mltk/tree/master/cpp//shared/gecko_sdk/audio_feature_generation)

## MLTK C++ Python Wrapper

The C++ Python [wrapper](../cpp_development/wrappers/audio_feature_generator_wrapper.md) allows for executing the AudioFeatureGenerator component from a Python script. 
This allows for executing the [AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library during model training. This is useful because the _exact_ spectrogram generation algorithms used by the embedded device at runtime may also be used during model training which should (hopefully) lead to more accurate model predictions.

The MLTK uses [pybind11](https://pybind11.readthedocs.io/en/latest/) to wrap the [AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library and generate a Windows/Linux binary that can be loaded into the Python runtime environment.


The AudioFeatureGenerator Python API docs may be found here: [mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator](mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator).  


### Source Code

- __C++ Python Wrapper__ - [cpp/audio_feature_generator_wrapper](https://github.com/siliconlabs/mltk/tree/master/cpp/audio_feature_generator_wrapper)  
- __Python API__ - [mltk/core/preprocess/audio/audio_feature_generator](https://github.com/siliconlabs/mltk/tree/master/mltk/core/preprocess/audio/audio_feature_generator)


```{note}
When [installing](../installation.md) the MLTK for local development, the C++ wrapper is automatically built into a Windows/Linux shared library (`.dll` / `.so`) and copied to the Python [directory](https://github.com/siliconlabs/mltk/tree/master/mltk/core/preprocess/audio/audio_feature_generator). 
When the [AudioFeatureGenerator](mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator) Python library is invoked by your Python scripts, the C++ wrapper shared library is loaded into the Python runtime environment.
```


### Usage

The recommended way of using the AudioFeatureGenerator [C++ wrapper](../cpp_development/wrappers/audio_feature_generator_wrapper.md)
is by calling the [mltk.core.preprocess.utils.audio.apply_frontend()](https://siliconlabs.github.io/mltk/docs/python_api/data_preprocessing/audio.html#mltk.core.preprocess.utils.audio.apply_frontend) API.

Refer to the [keyword_spotting_on_off_v3.py](https://github.com/siliconlabs/mltk/tree/master/mltk/models/siliconlabs/keyword_spotting_on_off_v3.py) [model specification](../guides/model_specification.md) for an example of how this is used.

Basically, 


1 ) In your [model specification](../guides/model_specification.md) file, define a model object to inherit the [DatasetMixin](mltk.core.DatasetMixin), e.g.:

```python
class MyModel(
    MltkModel, 
    TrainMixin, 
    DatasetMixin, 
    EvaluateClassifierMixin
):
    pass

```

2 ) In your [model specification](../guides/model_specification.md) file, configure the spectrogram settings, e.g:

```python

frontend_settings = AudioFeatureGeneratorSettings()

frontend_settings.sample_rate_hz = 16000
frontend_settings.sample_length_ms = 1000                       # A 1s buffer should be enough to capture the keywords
frontend_settings.window_size_ms = 30
frontend_settings.window_step_ms = 10
frontend_settings.filterbank_n_channels = 104                   # We want this value to be as large as possible
                                                                # while still allowing for the ML model to execute efficiently on the hardware
frontend_settings.filterbank_upper_band_limit = 7500.0
frontend_settings.filterbank_lower_band_limit = 125.0           # The dev board mic seems to have a lot of noise at lower frequencies

frontend_settings.noise_reduction_enable = True                 # Enable the noise reduction block to help ignore background noise in the field
frontend_settings.noise_reduction_smoothing_bits = 10
frontend_settings.noise_reduction_even_smoothing =  0.025
frontend_settings.noise_reduction_odd_smoothing = 0.06
frontend_settings.noise_reduction_min_signal_remaining = 0.40   # This value is fairly large (which makes the background noise reduction small)
                                                                # But it has been found to still give good results
                                                                # i.e. There is still some background noise reduction,
                                                                # but the actual signal is still (mostly) untouched

frontend_settings.dc_notch_filter_enable = True                 # Enable the DC notch filter, to help remove the DC signal from the dev board's mic
frontend_settings.dc_notch_filter_coefficient = 0.95

frontend_settings.quantize_dynamic_scale_enable = True          # Enable dynamic quantization, this dynamically converts the uint16 spectrogram to int8
frontend_settings.quantize_dynamic_scale_range_db = 40.0

# Add the Audio Feature generator settings to the model parameters
# This way, they are included in the generated .tflite model file
# See https://siliconlabs.github.io/mltk/docs/guides/model_parameters.html
my_model.model_parameters.update(frontend_settings)
```

3 ) Configure the your data pipeline to call the frontend:
```python
from mltk.core.preprocess.utils import audio as audio_utils

spectrogram = audio_utils.apply_frontend(
   sample=augmented_sample,
   settings=frontend_settings,
   dtype=np.int8
)
```


During model [training](../guides/model_training.md), spectrograms will be dynamically generated from the dataset's audio samples using the 
[AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) via [C++ Python wrapper](../cpp_development/wrappers/audio_feature_generator_wrapper.md).

At the end of training, the [spectrogram settings](../guides/model_parameters.md#audiodatasetmixin) are automatically embedded into the generated `.tflite` model file.


## Audio Visualizer Utility

The Audio Visualizer Utility provides a graphical interface to the [C++ Python wrapper](mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator) and thus 
[Gecko SDK AudioFeatureGenerator](https://docs.silabs.com/gecko-platform/latest/machine-learning/api/group-ml-audio-feature-generation) software library.
It allows for adjusting the various spectrogram settings and seeing how the resulting spectrogram is affected in real-time.


To use the Audio Visualizer utility, issue the command:

```shell
mltk view_audio
```

__NOTE:__ Internally, this will install the [wxPython](https://www.wxpython.org/) Python package.


![audio_visualizer](../img/audio_visualizer.gif)