Audio Utilities
================

The MLTK offers several utilities to aid the development of audio classification models:

- [Audio Classification Utility](#audio-classification-utility) - This allows for classifying real-time audio using a development board's or PC's microphone
- [Audio Visualization Utility](#audio-visualization-utility) - This allows for visualizing the spectrograms generated by the [Audio Feature Generator](./audio_feature_generator.md)
- [Synthetic Audio Dataset Generator](#synthetic-audio-dataset-generator) - This allows for generating datasets with custom keywords using synthetically generated data


The [Audio Classification Utility](#audio-classification-utility) and [Audio Visualization Utility](#audio-visualization-utility) depend on the [Audio Feature Generator](./audio_feature_generator.md) for converting audio signals into spectrograms.

A common usecase of audio classification models is "keyword spotting".  
Refer to the [Keyword Spotting Overview](./keyword_spotting_overview.md) for more
details on how spectrograms are used to detect keywords in streaming audio.

Refer to the [Keyword Spotting Tutorial](../../mltk/tutorials/keyword_spotting_on_off) for a complete
guide on how to use the MLTK to create an audio classification ML model.


## Audio Classification Utility

The audio classification utility is a complete keyword spotting application.  
It features:  
- Ability to classify real-time microphone audio and print results to terminal
- Support for running on an embedded device or Windows/Linux
- Ability to record the microphone audio
- Ability to dump spectrograms generated by the [Audio Feature Generator](./audio_feature_generator.md)


This utility works by executing a pre-built [audio_classifier](../cpp_development/examples/audio_classifier.md) application.  
The basic flow of this application is:  
```
Microphone -> AudioFeatureGenerator -> ML Model -> Command Recognizer -> Local Terminal
```

Refer to the `classify_audio` command's `--help` output for more details:  
```shell
mltk classify_audio --help
```

The following are examples of using the `classify_audio` command:


### Classify using PC Microphone

- Use the pre-trained ML model [keyword_spotting_on_off_v3.py](https://github.com/siliconlabs/mltk/tree/master/mltk/models/siliconlabs/keyword_spotting_on_off_v3.py)
- Use your local PC's microphone 
- Verbosely print the ML model's classification results to the terminal
- Set the detection threshold to 150 or 255 (a lower threshold means easier keyword detections with higher false-positives)

```shell
mltk classify_audio keyword_spotting_on_off_v3 --verbose --threshold 150
```

Say the keywords "on" or "off" into your PC's microphone.


### Classify using PC Microphone with simulated latency

Processing should occur much faster on a PC compared to an embedded device.  
We can add an artificial delay to simulate the delay that would occur on an embedded device.  
Refer to the [Keyword Spotting Overview](./keyword_spotting_overview.md) for why this matters.

- Use the pre-trained ML model [keyword_spotting_mobilenetv2.py](https://github.com/siliconlabs/mltk/tree/master/mltk/models/siliconlabs/keyword_spotting_mobilenetv2.py)
- Use your local PC's microphone 
- Verbosely print the ML model's classification results to the terminal
- Simulate the audio loop latency of 200ms

```shell
mltk classify_audio keyword_spotting_mobilenetv2 --verbose --latency 200
```

Say the keywords: left, right, up, down, stop, go into your PC's microphone.


### Classify using PC Microphone and record audio

We can record the audio captured by the PC's microphone.

```shell
mltk classify_audio keyword_spotting_mobilenetv2 --dump-audio
```

After running the command for awhile, issue CTRL+C.  
A `.wav` file will be generated at:  
```
<USER HOME Directory>/.mltk/audio_classify_recordings/<windows/linux>/audio/dump_audio.wav
```


### Classify using PC Microphone and dump spectrograms

We can dump the spectrograms generated by the [Audio Feature Generator](./audio_feature_generator.md).

```shell
mltk classify_audio keyword_spotting_mobilenetv2 --dump-spectrograms
```

After running the command for awhile, issue CTRL+C.  
A `.avi` video and corresponding `.jpg` images will be generated at:
```
<USER HOME Directory>/.mltk/audio_classify_recordings/<windows/linux>/spectrograms
```


### Classify using development board's microphone

Assuming you have a supported development board, you can also use the board's
microphone to classify audio __on__ the development board. i.e. The entire
audio classification (Audio processing + ML) runs on the embedded device.


- Use the pre-trained ML model [keyword_spotting_on_off_v3.py](https://github.com/siliconlabs/mltk/tree/master/mltk/models/siliconlabs/keyword_spotting_on_off_v3.py)
- Use the connected development board
- Verbosely print the ML model's classification results to the terminal
- Set the detection threshold to 150 or 255 (a lower threshold means easier keyword detections with higher false-positives)

```shell
mltk classify_audio keyword_spotting_on_off_v3 --device --verbose --threshold 150
```

Say the keywords "on" or "off" into dev board's microphone.  
If a keyword is detected, the red LED should turn on.
If activity is detected, the green LED should turn on.

Additionally, serial logs from the development board should print to the command terminal.


### Record audio from development board's microphone

We can record the audio captured by the development board's microphone.

```shell
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-audio
```

After running the command for awhile, issue CTRL+C.  
A `.wav` file will be generated at:  
```
<USER HOME Directory>/.mltk/audio_classify_recordings/<platform>/audio/dump_audio.wav
```

__NOTE:__ Audio classification is _not_ supported while recording the audio.


### Dump spectrograms generated by development board

We can dump the spectrograms generated by the [Audio Feature Generator](./audio_feature_generator.md) 
running on the embedded device.

```shell
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-spectrograms
```

After running the command for awhile, issue CTRL+C.  
A `.avi` video and corresponding `.jpg` images will be generated at:
```
<USER HOME Directory>/.mltk/audio_classify_recordings/<platform>/spectrograms
```

__NOTE:__ Audio classification is _not_ supported while dumping the spectrograms.


### Update AudioFeatureGenerator parameters

The [Audio Feature Generator](./audio_feature_generator.md)'s parameters are embedded into the `.tflite` model file
and loaded by the [audio_classifier](../cpp_development/examples/audio_classifier.md) application at runtime.
See the [Model Parameters](../guides/model_parameters.md) documentation for more details.

We can update these model parameters using the `update_params` command 
then re-run the `audio_classifier` command to generate different spectrograms.

e.g. To disable the AudioFeatureGenerator's noise reduction module:  
```
# Update the fe.noise_reduction_enable parameter
mltk update_params keyword_spotting_on_off_v3 fe.noise_reduction_enable=0

# Dump the spectrograms generated by the embedded device
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-spectrograms
```

__NOTE:__ The process is only recommended for experimentation. 
The ML model should be re-trained after adjusting the Audio Feature Generator's settings.


## Audio Visualization Utility

The Audio Visualizer Utility provides a graphical interface to the [Audio Feature Generator](./audio_feature_generator.md).
It allows for adjusting the various spectrogram settings and seeing how the resulting spectrogram is affected in real-time.


To use the Audio Visualizer utility, issue the command:

```shell
mltk view_audio
```

__NOTE:__ Internally, this will install the [wxPython](https://www.wxpython.org/) Python package.


![audio_visualizer](../img/audio_visualizer.gif)


## Synthetic Audio Dataset Generator

The MLTK features the [AudioDatasetGenerator](https://siliconlabs.github.io/mltk/docs/python_api/utils/audio_dataset_generator/index.html) Python package. This allows for generating custom keyword audio datasets using synthetically generated data.

The dataset samples are generated using the Text-to-Speech (TTS) services provided by:  
- [Google Cloud Platform (GCP)](https://cloud.google.com/text-to-speech)
- [Microsoft (Azure)](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech)
- [Amazon Web Services (AWS)](https://aws.amazon.com/polly)


Refer to the [Synthetic Audio Dataset Generation](https://siliconlabs.github.io/mltk/mltk/tutorials/synthetic_audio_dataset_generation.html) tutorial for more information.