autoencoder_example

This demonstrates how to build an autoencoder model. This is based on Tensorflow: Anomaly detection

In this example, you will train an autoencoder to detect anomalies on the ECG5000 dataset. This dataset contains 5,000 Electrocardiograms, each with 140 data points. You will use a simplified version of the dataset, where each example has been labeled either 0 (corresponding to an abnormal rhythm), or 1 (corresponding to a normal rhythm). You are interested in identifying the abnormal rhythms.

Commands

# Do a "dry run" test training of the model
mltk train autoencoder_example-test

# Train the model
mltk train autoencoder_example

# Evaluate the trained model .tflite model
# Also dump a comparsion of the original image vs the generated autoencoder image
mltk evaluate autoencoder_example --tflite --dump -- count 20

# Profile the model in the MVP hardware accelerator simulator
mltk profile autoencoder_example --accelerator MVP

# Profile the model on a physical development board
mltk profile autoencoder_example --accelerator MVP --device

# Directly invoke the model script
python autoencoder_example.py

Model Summary

mltk summarize autoencoder_example --tflite

+-------+-----------------+---------------+---------------+----------------------+
| Index | OpCode          | Input(s)      | Output(s)     | Config               |
+-------+-----------------+---------------+---------------+----------------------+
| 0     | quantize        | 140 (float32) | 140 (int8)    | BuiltinOptionsType=0 |
| 1     | fully_connected | 140 (int8)    | 32 (int8)     | Activation:relu      |
|       |                 | 140 (int8)    |               |                      |
|       |                 | 32 (int32)    |               |                      |
| 2     | fully_connected | 32 (int8)     | 16 (int8)     | Activation:relu      |
|       |                 | 32 (int8)     |               |                      |
|       |                 | 16 (int32)    |               |                      |
| 3     | fully_connected | 16 (int8)     | 8 (int8)      | Activation:relu      |
|       |                 | 16 (int8)     |               |                      |
|       |                 | 8 (int32)     |               |                      |
| 4     | fully_connected | 8 (int8)      | 16 (int8)     | Activation:relu      |
|       |                 | 8 (int8)      |               |                      |
|       |                 | 16 (int32)    |               |                      |
| 5     | fully_connected | 16 (int8)     | 32 (int8)     | Activation:relu      |
|       |                 | 16 (int8)     |               |                      |
|       |                 | 32 (int32)    |               |                      |
| 6     | fully_connected | 32 (int8)     | 140 (int8)    | Activation:none      |
|       |                 | 32 (int8)     |               |                      |
|       |                 | 140 (int32)   |               |                      |
| 7     | logistic        | 140 (int8)    | 140 (int8)    | BuiltinOptionsType=0 |
| 8     | dequantize      | 140 (int8)    | 140 (float32) | BuiltinOptionsType=0 |
+-------+-----------------+---------------+---------------+----------------------+
Total MACs: 10.240 k
Total OPs: 21.564 k
Name: autoencoder_example
Version: 1
Description: Autoencoder example to detect anomalies in ECG dataset
classes: []
hash: 66c8e81181a47dfcc2f0ff53a55aef49
date: 2022-04-28T19:08:38.662Z
runtime_memory_size: 2028
.tflite file size: 15.8kB

Model Profiling Report

# Profile on physical EFR32xG24 using MVP accelerator
mltk profile autoencoder_example --device --accelerator MVP

 Profiling Summary
 Name: autoencoder_example
 Accelerator: MVP
 Input Shape: 1x140
 Input Data Type: float32
 Output Shape: 1x140
 Output Data Type: float32
 Flash, Model File Size (bytes): 15.7k
 RAM, Runtime Memory Size (bytes): 3.4k
 Operation Count: 21.8k
 Multiply-Accumulate Count: 10.2k
 Layer Count: 9
 Unsupported Layer Count: 0
 Accelerator Cycle Count: 16.9k
 CPU Cycle Count: 131.5k
 CPU Utilization (%): 89.2
 Clock Rate (hz): 78.0M
 Time (s): 1.9m
 Ops/s: 11.5M
 MACs/s: 5.4M
 Inference/s: 529.1

 Model Layers
 +-------+-----------------+-------+--------+------------+------------+----------+-----------------+--------------+-----------------+
 | Index | OpCode          | # Ops | # MACs | Acc Cycles | CPU Cycles | Time (s) | Input Shape     | Output Shape | Options         |
 +-------+-----------------+-------+--------+------------+------------+----------+-----------------+--------------+-----------------+
 | 0     | quantize        | 560.0 | 0      | 0          | 5.5k       | 90.0u    | 1x140           | 1x140        | Type=none       |
 | 1     | fully_connected | 9.1k  | 4.5k   | 6.9k       | 2.3k       | 120.0u   | 1x140,32x140,32 | 1x32         | Activation:relu |
 | 2     | fully_connected | 1.1k  | 512.0  | 878.0      | 1.9k       | 30.0u    | 1x32,16x32,16   | 1x16         | Activation:relu |
 | 3     | fully_connected | 280.0 | 128.0  | 254.0      | 1.9k       | 30.0u    | 1x16,8x16,8     | 1x8          | Activation:relu |
 | 4     | fully_connected | 304.0 | 128.0  | 302.0      | 1.9k       | 30.0u    | 1x8,16x8,16     | 1x16         | Activation:relu |
 | 5     | fully_connected | 1.1k  | 512.0  | 974.0      | 1.9k       | 30.0u    | 1x16,32x16,32   | 1x32         | Activation:relu |
 | 6     | fully_connected | 9.1k  | 4.5k   | 7.6k       | 1.9k       | 120.0u   | 1x32,140x32,140 | 1x140        | Activation:none |
 | 7     | logistic        | 0     | 0      | 0          | 96.0k      | 1.2m     | 1x140           | 1x140        | Type=none       |
 | 8     | dequantize      | 280.0 | 0      | 0          | 18.0k      | 210.0u   | 1x140           | 1x140        | Type=none       |
 +-------+-----------------+-------+--------+------------+------------+----------+-----------------+--------------+-----------------+

Model Diagram

mltk view autoencoder_example --tflite

Model Specification

from typing import List
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf


import mltk.core as mltk_core
from mltk.utils.path import create_user_dir
from mltk.utils.archive_downloader import download_url



# Instantiate the MltkModel object with the following 'mixins':
# - TrainMixin            - Provides classifier model training operations and settings
# - DatasetMixin          - Provides general dataset operations and settings
# - EvaluateClassifierMixin         - Provides classifier evaluation operations and settings
# @mltk_model # NOTE: This tag is required for this model be discoverable
class MyModel(
    mltk_core.MltkModel,
    mltk_core.TrainMixin,
    mltk_core.DatasetMixin,
    mltk_core.EvaluateAutoEncoderMixin
):
    def load_dataset(
        self,
        subset: str,
        classes:List[str]=None,
        max_samples_per_class=None,
        test:bool=False,
        **kwargs
    ):
        super().load_dataset(subset)

        if test:
            max_samples_per_class = 3

        # Download the dataset (if necessary)
        dataset_path = f'{create_user_dir()}/datasets/ecg500.csv'
        download_url(
            'http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv',
            dataset_path
        )

        # Load the dataset into numpy array
        dataset = np.genfromtxt(dataset_path, delimiter=',', dtype=np.float32)

        # The last column contains the labels
        labels = dataset[:, -1]
        data = dataset[:,:-1]

        # Split the data into training and test data
        self.validation_split = 0.2
        train_data, test_data, train_labels, test_labels = train_test_split(
            data, labels, test_size=self.validation_split, random_state=21
        )

        min_val = tf.reduce_min(train_data)
        max_val = tf.reduce_max(train_data)

        train_data = (train_data - min_val) / (max_val - min_val)
        test_data = (test_data - min_val) / (max_val - min_val)

        train_labels_bool = train_labels.astype(bool)
        test_labels_bool = test_labels.astype(bool)

        normal_train_data = train_data[train_labels_bool]
        normal_test_data = test_data[test_labels_bool]

        anomalous_train_data = train_data[~train_labels_bool]
        anomalous_test_data = test_data[~test_labels_bool]

        self._normal_train_count = len(normal_train_data)
        self._normal_test_count = len(normal_test_data)
        self._abnormal_train_count = len(anomalous_train_data)
        self._abnormal_test_count = len(anomalous_test_data)

        # If we're evaluating,
        # then just return the "normal" or "abnormal" samples
        # NOTE: The y value is not required in this case
        if subset == 'evaluation':

            if classes[0] =='normal':
                x = normal_test_data
            else:
                x = anomalous_test_data

            if max_samples_per_class:
                sample_count = min(len(x), max_samples_per_class)
                x = x[0:sample_count]
            self.x = x
        else:
            # For training, we just use the "normal" data
            # Note that x and y use the same data as the whole point
            #  of an autoencoder is to reconstruct the input data
            self.x = normal_train_data
            self.y = normal_train_data
            self.validation_data = (test_data, test_data)


    def summarize_dataset(self) -> str:
        s = f'Train dataset: Found {self._normal_train_count} "normal", {self._abnormal_train_count} "abnormal" samples\n'
        s += f'Validation dataset: Found {self._normal_test_count} "normal", {self._abnormal_test_count} "abnormal" samples'
        return s




my_model = MyModel()


#################################################
# General Settings
#
my_model.version = 1
my_model.description = 'Autoencoder example to detect anomalies in ECG dataset'

my_model.input_shape = (140,)

#################################################
# Training Settings
my_model.epochs = 20
my_model.batch_size = 512
my_model.optimizer = 'adam'
my_model.metrics = ['mae']
my_model.loss = 'mae'

#################################################
# Training callback Settings

# Generate a training weights .h5 whenever the
# val_accuracy improves
my_model.checkpoint['monitor'] =  'val_loss'
my_model.checkpoint['mode'] =  'auto'


#################################################
# TF-Lite converter settings
my_model.tflite_converter['optimizations'] = ['DEFAULT']
my_model.tflite_converter['supported_ops'] = ['TFLITE_BUILTINS_INT8']
my_model.tflite_converter['inference_input_type'] = tf.float32
my_model.tflite_converter['inference_output_type'] = tf.float32
 # generate a representative dataset from the validation data
my_model.tflite_converter['representative_dataset'] = 'generate'




#################################################
# Build the ML Model
def my_model_builder(model: MyModel):
    model_input = tf.keras.layers.Input(shape=model.input_shape)
    encoder = tf.keras.Sequential([
        model_input,
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(16, activation="relu"),
        tf.keras.layers.Dense(8, activation="relu")]
    )

    decoder = tf.keras.Sequential([
        tf.keras.layers.Dense(16, activation="relu"),
        tf.keras.layers.Dense(32, activation="relu"),
        tf.keras.layers.Dense(140, activation="sigmoid")
    ])

    autoencoder = tf.keras.models.Model(model_input, decoder(encoder(model_input)))
    autoencoder.compile(
        loss=model.loss,
        optimizer=model.optimizer,
        metrics=model.metrics
    )

    return autoencoder

my_model.build_model_function = my_model_builder




##########################################################################################
# The following allows for running this model training script directly, e.g.:
# python autoencoder_example.py
#
# Note that this has the same functionality as:
# mltk train autoencoder_example
#
if __name__ == '__main__':
    from mltk import cli

    # Setup the CLI logger
    cli.get_logger(verbose=False)

    # If this is true then this will do a "dry run" of the model testing
    # If this is false, then the model will be fully trained
    test_mode_enabled = True

    # Train the model
    # This does the same as issuing the command: mltk train autoencoder_example-test --clean
    train_results = mltk_core.train_model(my_model, clean=True, test=test_mode_enabled)
    print(train_results)

    # Evaluate the model against the quantized .h5 (i.e. float32) model
    # This does the same as issuing the command: mltk evaluate autoencoder_example-test
    tflite_eval_results = mltk_core.evaluate_model(my_model, verbose=True, test=test_mode_enabled)
    print(tflite_eval_results)

    # Profile the model in the simulator
    # This does the same as issuing the command: mltk profile autoencoder_example-test
    profiling_results = mltk_core.profile_model(my_model, test=test_mode_enabled)
    print(profiling_results)