autoencoder_example¶
Source code: autoencoder_example.py
Pre-trained model: autoencoder_example.mltk.zip
This demonstrates how to build an autoencoder model. This is based on Tensorflow: Anomaly detection
In this example, you will train an autoencoder to detect anomalies on the ECG5000 dataset. This dataset contains 5,000 Electrocardiograms, each with 140 data points. You will use a simplified version of the dataset, where each example has been labeled either 0 (corresponding to an abnormal rhythm), or 1 (corresponding to a normal rhythm). You are interested in identifying the abnormal rhythms.
Commands¶
# Do a "dry run" test training of the model
mltk train autoencoder_example-test
# Train the model
mltk train autoencoder_example
# Evaluate the trained model .tflite model
# Also dump a comparsion of the original image vs the generated autoencoder image
mltk evaluate autoencoder_example --tflite --dump -- count 20
# Profile the model in the MVP hardware accelerator simulator
mltk profile autoencoder_example --accelerator MVP
# Profile the model on a physical development board
mltk profile autoencoder_example --accelerator MVP --device
# Directly invoke the model script
python autoencoder_example.py
Model Summary¶
mltk summarize autoencoder_example --tflite
+-------+-----------------+---------------+---------------+----------------------+
| Index | OpCode | Input(s) | Output(s) | Config |
+-------+-----------------+---------------+---------------+----------------------+
| 0 | quantize | 140 (float32) | 140 (int8) | BuiltinOptionsType=0 |
| 1 | fully_connected | 140 (int8) | 32 (int8) | Activation:relu |
| | | 140 (int8) | | |
| | | 32 (int32) | | |
| 2 | fully_connected | 32 (int8) | 16 (int8) | Activation:relu |
| | | 32 (int8) | | |
| | | 16 (int32) | | |
| 3 | fully_connected | 16 (int8) | 8 (int8) | Activation:relu |
| | | 16 (int8) | | |
| | | 8 (int32) | | |
| 4 | fully_connected | 8 (int8) | 16 (int8) | Activation:relu |
| | | 8 (int8) | | |
| | | 16 (int32) | | |
| 5 | fully_connected | 16 (int8) | 32 (int8) | Activation:relu |
| | | 16 (int8) | | |
| | | 32 (int32) | | |
| 6 | fully_connected | 32 (int8) | 140 (int8) | Activation:none |
| | | 32 (int8) | | |
| | | 140 (int32) | | |
| 7 | logistic | 140 (int8) | 140 (int8) | BuiltinOptionsType=0 |
| 8 | dequantize | 140 (int8) | 140 (float32) | BuiltinOptionsType=0 |
+-------+-----------------+---------------+---------------+----------------------+
Total MACs: 10.240 k
Total OPs: 21.564 k
Name: autoencoder_example
Version: 1
Description: Autoencoder example to detect anomalies in ECG dataset
classes: []
hash: 66c8e81181a47dfcc2f0ff53a55aef49
date: 2022-04-28T19:08:38.662Z
runtime_memory_size: 2028
.tflite file size: 15.8kB
Model Profiling Report¶
# Profile on physical EFR32xG24 using MVP accelerator
mltk profile autoencoder_example --device --accelerator MVP
Profiling Summary
Name: autoencoder_example
Accelerator: MVP
Input Shape: 1x140
Input Data Type: float32
Output Shape: 1x140
Output Data Type: float32
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 3.4k
Operation Count: 21.8k
Multiply-Accumulate Count: 10.2k
Layer Count: 9
Unsupported Layer Count: 0
Accelerator Cycle Count: 16.9k
CPU Cycle Count: 131.5k
CPU Utilization (%): 89.2
Clock Rate (hz): 78.0M
Time (s): 1.9m
Ops/s: 11.5M
MACs/s: 5.4M
Inference/s: 529.1
Model Layers
+-------+-----------------+-------+--------+------------+------------+----------+-----------------+--------------+-----------------+
| Index | OpCode | # Ops | # MACs | Acc Cycles | CPU Cycles | Time (s) | Input Shape | Output Shape | Options |
+-------+-----------------+-------+--------+------------+------------+----------+-----------------+--------------+-----------------+
| 0 | quantize | 560.0 | 0 | 0 | 5.5k | 90.0u | 1x140 | 1x140 | Type=none |
| 1 | fully_connected | 9.1k | 4.5k | 6.9k | 2.3k | 120.0u | 1x140,32x140,32 | 1x32 | Activation:relu |
| 2 | fully_connected | 1.1k | 512.0 | 878.0 | 1.9k | 30.0u | 1x32,16x32,16 | 1x16 | Activation:relu |
| 3 | fully_connected | 280.0 | 128.0 | 254.0 | 1.9k | 30.0u | 1x16,8x16,8 | 1x8 | Activation:relu |
| 4 | fully_connected | 304.0 | 128.0 | 302.0 | 1.9k | 30.0u | 1x8,16x8,16 | 1x16 | Activation:relu |
| 5 | fully_connected | 1.1k | 512.0 | 974.0 | 1.9k | 30.0u | 1x16,32x16,32 | 1x32 | Activation:relu |
| 6 | fully_connected | 9.1k | 4.5k | 7.6k | 1.9k | 120.0u | 1x32,140x32,140 | 1x140 | Activation:none |
| 7 | logistic | 0 | 0 | 0 | 96.0k | 1.2m | 1x140 | 1x140 | Type=none |
| 8 | dequantize | 280.0 | 0 | 0 | 18.0k | 210.0u | 1x140 | 1x140 | Type=none |
+-------+-----------------+-------+--------+------------+------------+----------+-----------------+--------------+-----------------+
Model Diagram¶
mltk view autoencoder_example --tflite
Model Specification¶
from typing import List
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
import mltk.core as mltk_core
from mltk.utils.path import create_user_dir
from mltk.utils.archive_downloader import download_url
# Instantiate the MltkModel object with the following 'mixins':
# - TrainMixin - Provides classifier model training operations and settings
# - DatasetMixin - Provides general dataset operations and settings
# - EvaluateClassifierMixin - Provides classifier evaluation operations and settings
# @mltk_model # NOTE: This tag is required for this model be discoverable
class MyModel(
mltk_core.MltkModel,
mltk_core.TrainMixin,
mltk_core.DatasetMixin,
mltk_core.EvaluateAutoEncoderMixin
):
def load_dataset(
self,
subset: str,
classes:List[str]=None,
max_samples_per_class=None,
test:bool=False,
**kwargs
):
super().load_dataset(subset)
if test:
max_samples_per_class = 3
# Download the dataset (if necessary)
dataset_path = f'{create_user_dir()}/datasets/ecg500.csv'
download_url(
'http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv',
dataset_path
)
# Load the dataset into numpy array
dataset = np.genfromtxt(dataset_path, delimiter=',', dtype=np.float32)
# The last column contains the labels
labels = dataset[:, -1]
data = dataset[:,:-1]
# Split the data into training and test data
self.validation_split = 0.2
train_data, test_data, train_labels, test_labels = train_test_split(
data, labels, test_size=self.validation_split, random_state=21
)
min_val = tf.reduce_min(train_data)
max_val = tf.reduce_max(train_data)
train_data = (train_data - min_val) / (max_val - min_val)
test_data = (test_data - min_val) / (max_val - min_val)
train_labels_bool = train_labels.astype(bool)
test_labels_bool = test_labels.astype(bool)
normal_train_data = train_data[train_labels_bool]
normal_test_data = test_data[test_labels_bool]
anomalous_train_data = train_data[~train_labels_bool]
anomalous_test_data = test_data[~test_labels_bool]
self._normal_train_count = len(normal_train_data)
self._normal_test_count = len(normal_test_data)
self._abnormal_train_count = len(anomalous_train_data)
self._abnormal_test_count = len(anomalous_test_data)
# If we're evaluating,
# then just return the "normal" or "abnormal" samples
# NOTE: The y value is not required in this case
if subset == 'evaluation':
if classes[0] =='normal':
x = normal_test_data
else:
x = anomalous_test_data
if max_samples_per_class:
sample_count = min(len(x), max_samples_per_class)
x = x[0:sample_count]
self.x = x
else:
# For training, we just use the "normal" data
# Note that x and y use the same data as the whole point
# of an autoencoder is to reconstruct the input data
self.x = normal_train_data
self.y = normal_train_data
self.validation_data = (test_data, test_data)
def summarize_dataset(self) -> str:
s = f'Train dataset: Found {self._normal_train_count} "normal", {self._abnormal_train_count} "abnormal" samples\n'
s += f'Validation dataset: Found {self._normal_test_count} "normal", {self._abnormal_test_count} "abnormal" samples'
return s
my_model = MyModel()
#################################################
# General Settings
#
my_model.version = 1
my_model.description = 'Autoencoder example to detect anomalies in ECG dataset'
my_model.input_shape = (140,)
#################################################
# Training Settings
my_model.epochs = 20
my_model.batch_size = 512
my_model.optimizer = 'adam'
my_model.metrics = ['mae']
my_model.loss = 'mae'
#################################################
# Training callback Settings
# Generate a training weights .h5 whenever the
# val_accuracy improves
my_model.checkpoint['monitor'] = 'val_loss'
my_model.checkpoint['mode'] = 'auto'
#################################################
# TF-Lite converter settings
my_model.tflite_converter['optimizations'] = ['DEFAULT']
my_model.tflite_converter['supported_ops'] = ['TFLITE_BUILTINS_INT8']
my_model.tflite_converter['inference_input_type'] = tf.float32
my_model.tflite_converter['inference_output_type'] = tf.float32
# generate a representative dataset from the validation data
my_model.tflite_converter['representative_dataset'] = 'generate'
#################################################
# Build the ML Model
def my_model_builder(model: MyModel):
model_input = tf.keras.layers.Input(shape=model.input_shape)
encoder = tf.keras.Sequential([
model_input,
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(16, activation="relu"),
tf.keras.layers.Dense(8, activation="relu")]
)
decoder = tf.keras.Sequential([
tf.keras.layers.Dense(16, activation="relu"),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(140, activation="sigmoid")
])
autoencoder = tf.keras.models.Model(model_input, decoder(encoder(model_input)))
autoencoder.compile(
loss=model.loss,
optimizer=model.optimizer,
metrics=model.metrics
)
return autoencoder
my_model.build_model_function = my_model_builder
##########################################################################################
# The following allows for running this model training script directly, e.g.:
# python autoencoder_example.py
#
# Note that this has the same functionality as:
# mltk train autoencoder_example
#
if __name__ == '__main__':
from mltk import cli
# Setup the CLI logger
cli.get_logger(verbose=False)
# If this is true then this will do a "dry run" of the model testing
# If this is false, then the model will be fully trained
test_mode_enabled = True
# Train the model
# This does the same as issuing the command: mltk train autoencoder_example-test --clean
train_results = mltk_core.train_model(my_model, clean=True, test=test_mode_enabled)
print(train_results)
# Evaluate the model against the quantized .h5 (i.e. float32) model
# This does the same as issuing the command: mltk evaluate autoencoder_example-test
tflite_eval_results = mltk_core.evaluate_model(my_model, verbose=True, test=test_mode_enabled)
print(tflite_eval_results)
# Profile the model in the simulator
# This does the same as issuing the command: mltk profile autoencoder_example-test
profiling_results = mltk_core.profile_model(my_model, test=test_mode_enabled)
print(profiling_results)