visual_wake_words

Visual Wakeword - Person detection (MobileNetv1 with COCO14)

Taken from:

Dataset

Model Topology

Performance (floating point model)

  • Accuracy - 85.4%

  • AUC - .931

Performance (quantized tflite model)

  • Accuracy - 85.0%

  • AUC - .928

Commands

# Do a "dry run" test training of the model
mltk train vww_model-test

# Train the model
mltk train vww_model

# Evaluate the trained model .tflite model
mltk evaluate vww_model --tflite

# Profile the model in the MVP hardware accelerator simulator
mltk profile vww_model --accelerator MVP

# Profile the model on a physical development board
mltk profile vww_model --accelerator MVP --device

Model Summary

mltk summarize visual_wake_words --tflite

+-------+-------------------+-------------------+-----------------+-----------------------------------------------------+
| Index | OpCode            | Input(s)          | Output(s)       | Config                                              |
+-------+-------------------+-------------------+-----------------+-----------------------------------------------------+
| 0     | quantize          | 96x96x3 (float32) | 96x96x3 (int8)  | BuiltinOptionsType=0                                |
| 1     | conv_2d           | 96x96x3 (int8)    | 48x48x8 (int8)  | Padding:same stride:2x2 activation:relu             |
|       |                   | 3x3x3 (int8)      |                 |                                                     |
|       |                   | 8 (int32)         |                 |                                                     |
| 2     | depthwise_conv_2d | 48x48x8 (int8)    | 48x48x8 (int8)  | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x8 (int8)      |                 |                                                     |
|       |                   | 8 (int32)         |                 |                                                     |
| 3     | conv_2d           | 48x48x8 (int8)    | 48x48x16 (int8) | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x8 (int8)      |                 |                                                     |
|       |                   | 16 (int32)        |                 |                                                     |
| 4     | depthwise_conv_2d | 48x48x16 (int8)   | 24x24x16 (int8) | Multipler:1 padding:same stride:2x2 activation:relu |
|       |                   | 3x3x16 (int8)     |                 |                                                     |
|       |                   | 16 (int32)        |                 |                                                     |
| 5     | conv_2d           | 24x24x16 (int8)   | 24x24x32 (int8) | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x16 (int8)     |                 |                                                     |
|       |                   | 32 (int32)        |                 |                                                     |
| 6     | depthwise_conv_2d | 24x24x32 (int8)   | 24x24x32 (int8) | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x32 (int8)     |                 |                                                     |
|       |                   | 32 (int32)        |                 |                                                     |
| 7     | conv_2d           | 24x24x32 (int8)   | 24x24x32 (int8) | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x32 (int8)     |                 |                                                     |
|       |                   | 32 (int32)        |                 |                                                     |
| 8     | depthwise_conv_2d | 24x24x32 (int8)   | 12x12x32 (int8) | Multipler:1 padding:same stride:2x2 activation:relu |
|       |                   | 3x3x32 (int8)     |                 |                                                     |
|       |                   | 32 (int32)        |                 |                                                     |
| 9     | conv_2d           | 12x12x32 (int8)   | 12x12x64 (int8) | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x32 (int8)     |                 |                                                     |
|       |                   | 64 (int32)        |                 |                                                     |
| 10    | depthwise_conv_2d | 12x12x64 (int8)   | 12x12x64 (int8) | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x64 (int8)     |                 |                                                     |
|       |                   | 64 (int32)        |                 |                                                     |
| 11    | conv_2d           | 12x12x64 (int8)   | 12x12x64 (int8) | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x64 (int8)     |                 |                                                     |
|       |                   | 64 (int32)        |                 |                                                     |
| 12    | depthwise_conv_2d | 12x12x64 (int8)   | 6x6x64 (int8)   | Multipler:1 padding:same stride:2x2 activation:relu |
|       |                   | 3x3x64 (int8)     |                 |                                                     |
|       |                   | 64 (int32)        |                 |                                                     |
| 13    | conv_2d           | 6x6x64 (int8)     | 6x6x128 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x64 (int8)     |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 14    | depthwise_conv_2d | 6x6x128 (int8)    | 6x6x128 (int8)  | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 15    | conv_2d           | 6x6x128 (int8)    | 6x6x128 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 16    | depthwise_conv_2d | 6x6x128 (int8)    | 6x6x128 (int8)  | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 17    | conv_2d           | 6x6x128 (int8)    | 6x6x128 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 18    | depthwise_conv_2d | 6x6x128 (int8)    | 6x6x128 (int8)  | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 19    | conv_2d           | 6x6x128 (int8)    | 6x6x128 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 20    | depthwise_conv_2d | 6x6x128 (int8)    | 6x6x128 (int8)  | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 21    | conv_2d           | 6x6x128 (int8)    | 6x6x128 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 22    | depthwise_conv_2d | 6x6x128 (int8)    | 6x6x128 (int8)  | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 23    | conv_2d           | 6x6x128 (int8)    | 6x6x128 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 24    | depthwise_conv_2d | 6x6x128 (int8)    | 3x3x128 (int8)  | Multipler:1 padding:same stride:2x2 activation:relu |
|       |                   | 3x3x128 (int8)    |                 |                                                     |
|       |                   | 128 (int32)       |                 |                                                     |
| 25    | conv_2d           | 3x3x128 (int8)    | 3x3x256 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x128 (int8)    |                 |                                                     |
|       |                   | 256 (int32)       |                 |                                                     |
| 26    | depthwise_conv_2d | 3x3x256 (int8)    | 3x3x256 (int8)  | Multipler:1 padding:same stride:1x1 activation:relu |
|       |                   | 3x3x256 (int8)    |                 |                                                     |
|       |                   | 256 (int32)       |                 |                                                     |
| 27    | conv_2d           | 3x3x256 (int8)    | 3x3x256 (int8)  | Padding:same stride:1x1 activation:relu             |
|       |                   | 1x1x256 (int8)    |                 |                                                     |
|       |                   | 256 (int32)       |                 |                                                     |
| 28    | average_pool_2d   | 3x3x256 (int8)    | 1x1x256 (int8)  | Padding:valid stride:3x3 filter:3x3 activation:none |
| 29    | reshape           | 1x1x256 (int8)    | 256 (int8)      | BuiltinOptionsType=0                                |
|       |                   | 2 (int32)         |                 |                                                     |
| 30    | fully_connected   | 256 (int8)        | 2 (int8)        | Activation:none                                     |
|       |                   | 256 (int8)        |                 |                                                     |
|       |                   | 2 (int32)         |                 |                                                     |
| 31    | softmax           | 2 (int8)          | 2 (int8)        | BuiltinOptionsType=9                                |
| 32    | dequantize        | 2 (int8)          | 2 (float32)     | BuiltinOptionsType=0                                |
+-------+-------------------+-------------------+-----------------+-----------------------------------------------------+
Total MACs: 7.490 M
Total OPs: 15.324 M
Name: visual_wake_words
Version: 1
Description: TinyML: Visual Wake Words - MobileNetv1 with COCO14
Classes: person, non_person
hash: 0fdc40de5812cfa530f6ec120c55171a
date: 2022-02-04T19:23:33.736Z
runtime_memory_size: 156424
samplewise_norm.rescale: 0.0
samplewise_norm.mean_and_std: False
.tflite file size: 334.2kB

Model Profiling Report

# Profile on physical EFR32xG24 using MVP accelerator
mltk profile visual_wake_words --device --accelerator MVP

 Profiling Summary
 Name: visual_wake_words
 Accelerator: MVP
 Input Shape: 1x96x96x3
 Input Data Type: float32
 Output Shape: 1x2
 Output Data Type: float32
 Flash, Model File Size (bytes): 334.1k
 RAM, Runtime Memory Size (bytes): 156.5k
 Operation Count: 15.8M
 Multiply-Accumulate Count: 7.5M
 Layer Count: 33
 Unsupported Layer Count: 0
 Accelerator Cycle Count: 7.5M
 CPU Cycle Count: 3.1M
 CPU Utilization (%): 32.4
 Clock Rate (hz): 78.0M
 Time (s): 123.2m
 Ops/s: 127.9M
 MACs/s: 60.8M
 Inference/s: 8.1

 Model Layers
 +-------+-------------------+--------+--------+------------+------------+----------+---------------------------+--------------+------------------------------------------------------+
 | Index | OpCode            | # Ops  | # MACs | Acc Cycles | CPU Cycles | Time (s) | Input Shape               | Output Shape | Options                                              |
 +-------+-------------------+--------+--------+------------+------------+----------+---------------------------+--------------+------------------------------------------------------+
 | 0     | quantize          | 110.6k | 0      | 0          | 941.1k     | 11.8m    | 1x96x96x3                 | 1x96x96x3    | Type=none                                            |
 | 1     | conv_2d           | 1.1M   | 497.7k | 695.3k     | 15.8k      | 8.8m     | 1x96x96x3,8x3x3x3,8       | 1x48x48x8    | Padding:same stride:2x2 activation:relu              |
 | 2     | depthwise_conv_2d | 387.1k | 165.9k | 583.3k     | 249.1k     | 9.4m     | 1x48x48x8,1x3x3x8,8       | 1x48x48x8    | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 3     | conv_2d           | 700.4k | 294.9k | 341.1k     | 5.9k       | 4.3m     | 1x48x48x8,16x1x1x8,16     | 1x48x48x16   | Padding:same stride:1x1 activation:relu              |
 | 4     | depthwise_conv_2d | 193.5k | 82.9k  | 297.9k     | 257.2k     | 5.9m     | 1x48x48x16,1x3x3x16,16    | 1x24x24x16   | Multiplier:1 padding:same stride:2x2 activation:relu |
 | 5     | conv_2d           | 645.1k | 294.9k | 281.2k     | 5.9k       | 3.6m     | 1x24x24x16,32x1x1x16,32   | 1x24x24x32   | Padding:same stride:1x1 activation:relu              |
 | 6     | depthwise_conv_2d | 387.1k | 165.9k | 299.2k     | 806.6k     | 10.2m    | 1x24x24x32,1x3x3x32,32    | 1x24x24x32   | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 7     | conv_2d           | 1.2M   | 589.8k | 507.0k     | 5.4k       | 6.4m     | 1x24x24x32,32x1x1x32,32   | 1x24x24x32   | Padding:same stride:1x1 activation:relu              |
 | 8     | depthwise_conv_2d | 96.8k  | 41.5k  | 74.8k      | 202.8k     | 2.6m     | 1x24x24x32,1x3x3x32,32    | 1x12x12x32   | Multiplier:1 padding:same stride:2x2 activation:relu |
 | 9     | conv_2d           | 617.5k | 294.9k | 251.2k     | 5.4k       | 3.2m     | 1x12x12x32,64x1x1x32,64   | 1x12x12x64   | Padding:same stride:1x1 activation:relu              |
 | 10    | depthwise_conv_2d | 193.5k | 82.9k  | 139.9k     | 207.9k     | 2.9m     | 1x12x12x64,1x3x3x64,64    | 1x12x12x64   | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 11    | conv_2d           | 1.2M   | 589.8k | 474.7k     | 5.4k       | 6.0m     | 1x12x12x64,64x1x1x64,64   | 1x12x12x64   | Padding:same stride:1x1 activation:relu              |
 | 12    | depthwise_conv_2d | 48.4k  | 20.7k  | 35.0k      | 52.9k      | 750.0u   | 1x12x12x64,1x3x3x64,64    | 1x6x6x64     | Multiplier:1 padding:same stride:2x2 activation:relu |
 | 13    | conv_2d           | 603.6k | 294.9k | 236.3k     | 5.4k       | 3.0m     | 1x6x6x64,128x1x1x64,128   | 1x6x6x128    | Padding:same stride:1x1 activation:relu              |
 | 14    | depthwise_conv_2d | 96.8k  | 41.5k  | 62.2k      | 53.3k      | 1.1m     | 1x6x6x128,1x3x3x128,128   | 1x6x6x128    | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 15    | conv_2d           | 1.2M   | 589.8k | 458.7k     | 5.4k       | 5.8m     | 1x6x6x128,128x1x1x128,128 | 1x6x6x128    | Padding:same stride:1x1 activation:relu              |
 | 16    | depthwise_conv_2d | 96.8k  | 41.5k  | 62.2k      | 53.3k      | 1.1m     | 1x6x6x128,1x3x3x128,128   | 1x6x6x128    | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 17    | conv_2d           | 1.2M   | 589.8k | 458.7k     | 5.4k       | 5.8m     | 1x6x6x128,128x1x1x128,128 | 1x6x6x128    | Padding:same stride:1x1 activation:relu              |
 | 18    | depthwise_conv_2d | 96.8k  | 41.5k  | 62.2k      | 53.3k      | 1.1m     | 1x6x6x128,1x3x3x128,128   | 1x6x6x128    | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 19    | conv_2d           | 1.2M   | 589.8k | 458.7k     | 5.4k       | 5.8m     | 1x6x6x128,128x1x1x128,128 | 1x6x6x128    | Padding:same stride:1x1 activation:relu              |
 | 20    | depthwise_conv_2d | 96.8k  | 41.5k  | 62.2k      | 53.3k      | 1.1m     | 1x6x6x128,1x3x3x128,128   | 1x6x6x128    | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 21    | conv_2d           | 1.2M   | 589.8k | 458.7k     | 5.4k       | 5.8m     | 1x6x6x128,128x1x1x128,128 | 1x6x6x128    | Padding:same stride:1x1 activation:relu              |
 | 22    | depthwise_conv_2d | 96.8k  | 41.5k  | 62.2k      | 53.3k      | 1.1m     | 1x6x6x128,1x3x3x128,128   | 1x6x6x128    | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 23    | conv_2d           | 1.2M   | 589.8k | 458.7k     | 5.4k       | 5.8m     | 1x6x6x128,128x1x1x128,128 | 1x6x6x128    | Padding:same stride:1x1 activation:relu              |
 | 24    | depthwise_conv_2d | 24.2k  | 10.4k  | 15.5k      | 14.2k      | 300.0u   | 1x6x6x128,1x3x3x128,128   | 1x3x3x128    | Multiplier:1 padding:same stride:2x2 activation:relu |
 | 25    | conv_2d           | 596.7k | 294.9k | 229.3k     | 5.4k       | 2.9m     | 1x3x3x128,256x1x1x128,256 | 1x3x3x256    | Padding:same stride:1x1 activation:relu              |
 | 26    | depthwise_conv_2d | 48.4k  | 20.7k  | 24.9k      | 14.2k      | 420.0u   | 1x3x3x256,1x3x3x256,256   | 1x3x3x256    | Multiplier:1 padding:same stride:1x1 activation:relu |
 | 27    | conv_2d           | 1.2M   | 589.8k | 451.3k     | 5.4k       | 5.7m     | 1x3x3x256,256x1x1x256,256 | 1x3x3x256    | Padding:same stride:1x1 activation:relu              |
 | 28    | average_pool_2d   | 2.6k   | 0      | 1.6k       | 3.9k       | 60.0u    | 1x3x3x256                 | 1x1x1x256    | Padding:valid stride:3x3 filter:3x3 activation:none  |
 | 29    | reshape           | 0      | 0      | 0          | 1.7k       | 0        | 1x1x1x256,2               | 1x256        | Type=none                                            |
 | 30    | fully_connected   | 1.0k   | 512.0  | 797.0      | 2.1k       | 60.0u    | 1x256,2x256,2             | 1x2          | Activation:none                                      |
 | 31    | softmax           | 10.0   | 0      | 0          | 2.8k       | 30.0u    | 1x2                       | 1x2          | Type=softmaxoptions                                  |
 | 32    | dequantize        | 4.0    | 0      | 0          | 938.0      | 0        | 1x2                       | 1x2          | Type=none                                            |
 +-------+-------------------+--------+--------+------------+------------+----------+---------------------------+--------------+------------------------------------------------------+

Model Diagram

mltk view visual_wake_words --tflite

Model Specification

from mltk.core.preprocess.image.parallel_generator import ParallelImageDataGenerator
from mltk.core.model import (
    MltkModel,
    TrainMixin,
    ImageDatasetMixin,
    EvaluateClassifierMixin
)
from mltk.models.shared import MobileNetV1
from mltk.utils.archive_downloader import download_verify_extract


# Instantiate the MltkModel object with the following 'mixins':
# - TrainMixin            - Provides classifier model training operations and settings
# - ImageDatasetMixin     - Provides image data generation operations and settings
# - EvaluateClassifierMixin         - Provides classifier evaluation operations and settings
# @mltk_model   # NOTE: This tag is required for this model be discoverable
class MyModel(
    MltkModel,
    TrainMixin,
    ImageDatasetMixin,
    EvaluateClassifierMixin
):
    pass
my_model = MyModel()

#################################################
# General parameters
my_model.version = 1
my_model.description = 'TinyML: Visual Wake Words - MobileNetv1 with COCO14'

#################################################
# Training parameters
my_model.epochs = 50
my_model.batch_size = 50
my_model.optimizer = 'adam'
my_model.metrics = ['accuracy']
my_model.loss = 'categorical_crossentropy'


#################################################
# Image Dataset Settings

# The directory of the training data

def download_data():
    return download_verify_extract(
        url='https://www.silabs.com/public/files/github/machine_learning/benchmarks/datasets/vw_coco2014_96.tar.gz',
        dest_subdir='datasets/mscoco14/preprocessed/v1',
        file_hash='A5A465082D3F396407F8B5ABAF824DD5B28439C4',
        show_progress=True,
        remove_root_dir=True
    )


my_model.dataset = download_data
# The classification type
my_model.class_mode = 'categorical'
# The class labels found in your training dataset directory
my_model.classes =  ('person', 'non_person')
# The input shape to the model. The dataset samples will be resized if necessary
my_model.input_shape = (96, 96, 3)

validation_split = 0.1


##############################################################
# Training callbacks
#

# Learning rate schedule
def lr_schedule(epoch):
    lrate = 0.001
    if epoch > 20:
        lrate = 0.0005
    if epoch > 30:
        lrate = 0.00025
    return lrate

my_model.lr_schedule = dict(
    schedule = lr_schedule,
    verbose = 1
)

my_model.datagen = ParallelImageDataGenerator(
    cores=.35,
    max_batches_pending=32,
    rotation_range=10,
    width_shift_range=0.05,
    height_shift_range=0.05,
    zoom_range=.1,
    horizontal_flip=True,
    validation_split=validation_split
)



##############################################################
# Model Layout
def my_model_builder(model: MyModel):
    keras_model = MobileNetV1(
        input_shape=model.input_shape
    )
    keras_model.compile(
        loss=model.loss,
        optimizer=model.optimizer,
        metrics=model.metrics
    )
    return keras_model

my_model.build_model_function = my_model_builder




##########################################################################################
# The following allows for running this model training script directly, e.g.:
# python visual_wake_words.py
#
# Note that this has the same functionality as:
# mltk train visual_wake_words
#
if __name__ == '__main__':
    import mltk.core as mltk_core
    from mltk import cli

    # Setup the CLI logger
    cli.get_logger(verbose=False)

    # If this is true then this will do a "dry run" of the model testing
    # If this is false, then the model will be fully trained
    test_mode_enabled = True

    # Train the model
    # This does the same as issuing the command: mltk train visual_wake_words-test --clean
    train_results = mltk_core.train_model(my_model, clean=True, test=test_mode_enabled)
    print(train_results)

    # Evaluate the model against the quantized .h5 (i.e. float32) model
    # This does the same as issuing the command: mltk evaluate visual_wake_words-test
    tflite_eval_results = mltk_core.evaluate_model(my_model, verbose=True, test=test_mode_enabled)
    print(tflite_eval_results)

    # Profile the model in the simulator
    # This does the same as issuing the command: mltk profile visual_wake_words-test
    profiling_results = mltk_core.profile_model(my_model, test=test_mode_enabled)
    print(profiling_results)