keyword_spotting_on_off

This model specification script is designed to work with the Keyword Spotting On/Off tutorial.

This model is a CNN classifier to detect the keywords:

  • on

  • off

Dataset

This uses the mltk.datasets.audio.speech_commands.speech_commands_v2 dataset provided by Google.

Preprocessing

This uses the mltk.core.preprocess.audio.parallel_generator.ParallelAudioDataGenerator with the mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator settings:

  • sample_rate: 8kHz

  • sample_length: 1.0s

  • window size: 30ms

  • window step: 20ms

  • n_channels: 32

Commands

# Do a "dry run" test training of the model
mltk train keyword_spotting_on_off-test

# Train the model
mltk train keyword_spotting_on_off

# Evaluate the trained model .tflite model
mltk evaluate keyword_spotting_on_off --tflite

# Profile the model in the MVP hardware accelerator simulator
mltk profile keyword_spotting_on_off --accelerator MVP

# Profile the model on a physical development board
mltk profile keyword_spotting_on_off  --accelerator MVP --device

# Run the model in the audio classifier on the local PC
mltk classify_audio keyword_spotting_on_off --verbose

# Run the model in the audio classifier on the physical device
mltk classify_audio keyword_spotting_on_off --device --verbose

Model Summary

mltk summarize keyword_spotting_on_off --tflite

+-------+-----------------+----------------+----------------+-----------------------------------------------------+
| Index | OpCode          | Input(s)       | Output(s)      | Config                                              |
+-------+-----------------+----------------+----------------+-----------------------------------------------------+
| 0     | conv_2d         | 49x32x1 (int8) | 25x16x8 (int8) | Padding:same stride:2x2 activation:relu             |
|       |                 | 3x3x1 (int8)   |                |                                                     |
|       |                 | 8 (int32)      |                |                                                     |
| 1     | conv_2d         | 25x16x8 (int8) | 13x8x16 (int8) | Padding:same stride:2x2 activation:relu             |
|       |                 | 3x3x8 (int8)   |                |                                                     |
|       |                 | 16 (int32)     |                |                                                     |
| 2     | conv_2d         | 13x8x16 (int8) | 7x4x32 (int8)  | Padding:same stride:2x2 activation:relu             |
|       |                 | 3x3x16 (int8)  |                |                                                     |
|       |                 | 32 (int32)     |                |                                                     |
| 3     | max_pool_2d     | 7x4x32 (int8)  | 1x4x32 (int8)  | Padding:valid stride:1x7 filter:1x7 activation:none |
| 4     | reshape         | 1x4x32 (int8)  | 128 (int8)     | BuiltinOptionsType=0                                |
|       |                 | 2 (int32)      |                |                                                     |
| 5     | fully_connected | 128 (int8)     | 4 (int8)       | Activation:none                                     |
|       |                 | 128 (int8)     |                |                                                     |
|       |                 | 4 (int32)      |                |                                                     |
| 6     | softmax         | 4 (int8)       | 4 (int8)       | BuiltinOptionsType=9                                |
+-------+-----------------+----------------+----------------+-----------------------------------------------------+
Total MACs: 278.144 k
Total OPs: 563.084 k
Name: keyword_spotting_on_off
Version: 1
Description: Keyword spotting classifier to detect: "on" and "off"
Classes: on, off, _unknown_, _silence_
hash: 782baa4c65acec0db85a71d2be78eb29
date: 2022-02-04T19:05:11.747Z
runtime_memory_size: 6712
average_window_duration_ms: 1000
detection_threshold: 160
suppression_ms: 750
minimum_count: 3
volume_db: 5.0
latency_ms: 0
log_level: info
samplewise_norm.rescale: 0.0
samplewise_norm.mean_and_std: False
fe.sample_rate_hz: 8000
fe.sample_length_ms: 1000
fe.window_size_ms: 30
fe.window_step_ms: 20
fe.filterbank_n_channels: 32
fe.filterbank_upper_band_limit: 3999.0
fe.filterbank_lower_band_limit: 100.0
fe.noise_reduction_enable: True
fe.noise_reduction_smoothing_bits: 5
fe.noise_reduction_even_smoothing: 0.004000000189989805
fe.noise_reduction_odd_smoothing: 0.004000000189989805
fe.noise_reduction_min_signal_remaining: 0.05000000074505806
fe.pcan_enable: False
fe.pcan_strength: 0.949999988079071
fe.pcan_offset: 80.0
fe.pcan_gain_bits: 21
fe.log_scale_enable: True
fe.log_scale_shift: 6
fe.fft_length: 256
.tflite file size: 15.3kB

Model Profiling Report

# Profile on physical EFR32xG24 using MVP accelerator
mltk profile keyword_spotting_on_off --device --accelerator MVP

 Profiling Summary
 Name: keyword_spotting_on_off
 Accelerator: MVP
 Input Shape: 1x49x32x1
 Input Data Type: int8
 Output Shape: 1x4
 Output Data Type: int8
 Flash, Model File Size (bytes): 15.3k
 RAM, Runtime Memory Size (bytes): 13.4k
 Operation Count: 574.5k
 Multiply-Accumulate Count: 278.1k
 Layer Count: 7
 Unsupported Layer Count: 0
 Accelerator Cycle Count: 224.3k
 CPU Cycle Count: 98.9k
 CPU Utilization (%): 34.6
 Clock Rate (hz): 78.0M
 Time (s): 3.7m
 Ops/s: 157.0M
 MACs/s: 76.0M
 Inference/s: 273.2

 Model Layers
 +-------+-----------------+--------+--------+------------+------------+----------+------------------------+--------------+-----------------------------------------------------+
 | Index | OpCode          | # Ops  | # MACs | Acc Cycles | CPU Cycles | Time (s) | Input Shape            | Output Shape | Options                                             |
 +-------+-----------------+--------+--------+------------+------------+----------+------------------------+--------------+-----------------------------------------------------+
 | 0     | conv_2d         | 67.2k  | 28.8k  | 46.7k      | 20.6k      | 720.0u   | 1x49x32x1,8x3x3x1,8    | 1x25x16x8    | Padding:same stride:2x2 activation:relu             |
 | 1     | conv_2d         | 244.6k | 119.8k | 90.8k      | 20.7k      | 1.3m     | 1x25x16x8,16x3x3x8,16  | 1x13x8x16    | Padding:same stride:2x2 activation:relu             |
 | 2     | conv_2d         | 260.7k | 129.0k | 85.2k      | 20.2k      | 1.2m     | 1x13x8x16,32x3x3x16,32 | 1x7x4x32     | Padding:same stride:2x2 activation:relu             |
 | 3     | max_pool_2d     | 896.0  | 0      | 800.0      | 30.0k      | 390.0u   | 1x7x4x32               | 1x1x4x32     | Padding:valid stride:1x7 filter:1x7 activation:none |
 | 4     | reshape         | 0      | 0      | 0          | 1.1k       | 30.0u    | 1x1x4x32,2             | 1x128        | Type=none                                           |
 | 5     | fully_connected | 1.0k   | 512.0  | 809.0      | 2.2k       | 30.0u    | 1x128,4x128,4          | 1x4          | Activation:none                                     |
 | 6     | softmax         | 20.0   | 0      | 0          | 4.1k       | 60.0u    | 1x4                    | 1x4          | Type=softmaxoptions                                 |
 +-------+-----------------+--------+--------+------------+------------+----------+------------------------+--------------+-----------------------------------------------------+

Model Diagram

mltk view keyword_spotting_on_off --tflite

License

This model was developed by Silicon Labs and is covered by a standard Silicon Labs MSLA.