Audio Utilities

The MLTK offers several utilities to aid the development of audio classification models:

The Audio Classification Utility and Audio Visualization Utility depend on the Audio Feature Generator for converting audio signals into spectrograms.

A common usecase of audio classification models is “keyword spotting”.
Refer to the Keyword Spotting Overview for more details on how spectrograms are used to detect keywords in streaming audio.

Refer to the Keyword Spotting Tutorial for a complete guide on how to use the MLTK to create an audio classification ML model.

Audio Classification Utility

The audio classification utility is a complete keyword spotting application.
It features:

  • Ability to classify real-time microphone audio and print results to terminal

  • Support for running on an embedded device or Windows/Linux

  • Ability to record the microphone audio

  • Ability to dump spectrograms generated by the Audio Feature Generator

This utility works by executing a pre-built audio_classifier application.
The basic flow of this application is:

Microphone -> AudioFeatureGenerator -> ML Model -> Command Recognizer -> Local Terminal

Refer to the classify_audio command’s --help output for more details:

mltk classify_audio --help

The following are examples of using the classify_audio command:

Classify using PC Microphone

  • Use the pre-trained ML model keyword_spotting_on_off_v3.py

  • Use your local PC’s microphone

  • Verbosely print the ML model’s classification results to the terminal

  • Set the detection threshold to 150 or 255 (a lower threshold means easier keyword detections with higher false-positives)

mltk classify_audio keyword_spotting_on_off_v3 --verbose --threshold 150

Say the keywords “on” or “off” into your PC’s microphone.

Classify using PC Microphone with simulated latency

Processing should occur much faster on a PC compared to an embedded device.
We can add an artificial delay to simulate the delay that would occur on an embedded device.
Refer to the Keyword Spotting Overview for why this matters.

  • Use the pre-trained ML model keyword_spotting_mobilenetv2.py

  • Use your local PC’s microphone

  • Verbosely print the ML model’s classification results to the terminal

  • Simulate the audio loop latency of 200ms

mltk classify_audio keyword_spotting_mobilenetv2 --verbose --latency 200

Say the keywords: left, right, up, down, stop, go into your PC’s microphone.

Classify using PC Microphone and record audio

We can record the audio captured by the PC’s microphone.

mltk classify_audio keyword_spotting_mobilenetv2 --dump-audio

After running the command for awhile, issue CTRL+C.
A .wav file will be generated at:

<USER HOME Directory>/.mltk/audio_classify_recordings/<windows/linux>/audio/dump_audio.wav

Classify using PC Microphone and dump spectrograms

We can dump the spectrograms generated by the Audio Feature Generator.

mltk classify_audio keyword_spotting_mobilenetv2 --dump-spectrograms

After running the command for awhile, issue CTRL+C.
A .avi video and corresponding .jpg images will be generated at:

<USER HOME Directory>/.mltk/audio_classify_recordings/<windows/linux>/spectrograms

Classify using development board’s microphone

Assuming you have a supported development board, you can also use the board’s microphone to classify audio on the development board. i.e. The entire audio classification (Audio processing + ML) runs on the embedded device.

  • Use the pre-trained ML model keyword_spotting_on_off_v3.py

  • Use the connected development board

  • Verbosely print the ML model’s classification results to the terminal

  • Set the detection threshold to 150 or 255 (a lower threshold means easier keyword detections with higher false-positives)

mltk classify_audio keyword_spotting_on_off_v3 --device --verbose --threshold 150

Say the keywords “on” or “off” into dev board’s microphone.
If a keyword is detected, the red LED should turn on. If activity is detected, the green LED should turn on.

Additionally, serial logs from the development board should print to the command terminal.

Record audio from development board’s microphone

We can record the audio captured by the development board’s microphone.

mltk classify_audio keyword_spotting_on_off_v3 --device --dump-audio

After running the command for awhile, issue CTRL+C.
A .wav file will be generated at:

<USER HOME Directory>/.mltk/audio_classify_recordings/<platform>/audio/dump_audio.wav

NOTE: Audio classification is not supported while recording the audio.

Dump spectrograms generated by development board

We can dump the spectrograms generated by the Audio Feature Generator running on the embedded device.

mltk classify_audio keyword_spotting_on_off_v3 --device --dump-spectrograms

After running the command for awhile, issue CTRL+C.
A .avi video and corresponding .jpg images will be generated at:

<USER HOME Directory>/.mltk/audio_classify_recordings/<platform>/spectrograms

NOTE: Audio classification is not supported while dumping the spectrograms.

Update AudioFeatureGenerator parameters

The Audio Feature Generator’s parameters are embedded into the .tflite model file and loaded by the audio_classifier application at runtime. See the Model Parameters documentation for more details.

We can update these model parameters using the update_params command then re-run the audio_classifier command to generate different spectrograms.

e.g. To disable the AudioFeatureGenerator’s noise reduction module:

# Update the fe.noise_reduction_enable parameter
mltk update_params keyword_spotting_on_off_v3 fe.noise_reduction_enable=0

# Dump the spectrograms generated by the embedded device
mltk classify_audio keyword_spotting_on_off_v3 --device --dump-spectrograms

NOTE: The process is only recommended for experimentation. The ML model should be re-trained after adjusting the Audio Feature Generator’s settings.

Audio Visualization Utility

The Audio Visualizer Utility provides a graphical interface to the Audio Feature Generator. It allows for adjusting the various spectrogram settings and seeing how the resulting spectrogram is affected in real-time.

To use the Audio Visualizer utility, issue the command:

mltk view_audio

NOTE: Internally, this will install the wxPython Python package.

audio_visualizer

Synthetic Audio Dataset Generator

The MLTK features the AudioDatasetGenerator Python package. This allows for generating custom keyword audio datasets using synthetically generated data.

The dataset samples are generated using the Text-to-Speech (TTS) services provided by:

Refer to the Synthetic Audio Dataset Generation tutorial for more information.