evaluate_model¶

evaluate_model(model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, dump=False, show=False, verbose=None, callbacks=None, update_archive=True, test=False, post_process=False)[source]¶

Evaluate a trained model

This internally calls:

mltk.core.evaluate_classifier()
mltk.core.evaluate_autoencoder()

based on the given mltk.core.MltkModel instance.

See also

Parameters:

model (Union[MltkModel, str]) – mltk.core.MltkModel instance, name of MLTK model, path to model archive .mltk.zip or model specification script .py
tflite (bool) – If True, evaluate the .tflite (i.e. quantized) model file. If False, evaluate the Keras``.h5`` model (i.e. float)
weights (str) –
Optional, load weights from previous training session. May be one of the following:
- If option omitted then evaluate using output .h5 or .tflite from training
- Absolute path to a generated weights .h5 file generated by Keras during training
- The keyword best; find the best weights in <model log dir>/train/weights
- Filename of .h5 in <model log dir>/train/weights
Note: This option may only be used if the “–tflite” option is not used
max_samples_per_class (int) – By default, all validation samples are used. This option places an upper limit on the number of samples per class that are used for evaluation
classes (List[str]) – If evaluating a model with the mltk.core.EvaluateAutoEncoderMixin, then this should be a comma-seperated list of classes in the dataset. The first element should be considered the “normal” class, every other class is considered abnormal and compared independently. If not provided, then the classes default to: [normal, abnormal]
dump (bool) – If evaluating a model with the mltk.core.EvaluateAutoEncoderMixin, then, for each sample, an image will be generated comparing the sample to the decoded sample
show (bool) – Display the generated performance diagrams
verbose (bool) – Enable verbose console logs
callbacks (List) – List of Keras callbacks to use for evaluation
update_archive (bool) – Update the model archive with the evaluation results
test (bool) – Optional, load the model in “test mode” if true.
post_process (bool) – This allows for post-processing the evaluation results (e.g. uploading to a cloud) if supported by the given MltkModel

Return type:

EvaluationResults

Returns:

Dictionary of evaluation results

evaluate_classifier¶

class EvaluationResults[source]¶

Holds model evaluation results

Note

The Implementation details are specific to the model type

See also

mltk.core.ClassifierEvaluationResults
mltk.core.AutoEncoderEvaluationResults

__init__(name, model_type='generic', **kwargs)[source]¶

Parameters:

name (str) –
model_type (str) –

property name: str¶

The name of the evaluated model

Return type:: str

property model_type: str¶

The type of the evaluated model (e.g. classification, autoencoder, etc.)

Return type:: str

generate_summary(include_all=True)[source]¶

Generate and return a summary of the results as a string

Return type:: str

generate_plots(show=True, output_dir=None, logger=None)[source]¶

Generate plots of the evaluation results

Parameters:

show – Display the generated plots
output_dir (str) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directory
logger (Logger) – Optional logger

evaluate_classifier¶

evaluate_classifier(mltk_model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, verbose=False, show=False, update_archive=True, **kwargs)[source]¶

Evaluate a trained classification model

Parameters:

mltk_model (MltkModel) – MltkModel instance
tflite (bool) – If true then evalute the .tflite (i.e. quantized) model, otherwise evaluate the keras model
weights (str) – Optional weights to load before evaluating (only valid for a keras model)
max_samples_per_class (int) – Maximum number of samples per class to evaluate. This is useful for large datasets
classes (List[str]) – Specific classes to evaluate
verbose (bool) – Enable progress bar
show (bool) – Show the evaluation results diagrams
update_archive (bool) – Update the model archive with the eval results

Return type:

ClassifierEvaluationResults

Returns:

Dictionary containing evaluation results

ClassifierEvaluationResults¶

class ClassifierEvaluationResults[source]¶

Classifier evaluation results

See also

evaluate_classifier()
mltk.core.evaluate_model()

__init__(*args, **kwargs)[source]¶

property classes: List[str]¶

List of class labels used by evaluated model

Return type:: List[str]

property overall_accuracy: float¶

The overall, model accuracy

Return type:: float

property class_accuracies: List[float]¶

List of each classes’ accuracy

Return type:: List[float]

property false_positive_rate: float¶

The false positive rate

Return type:: float

property fpr: float¶

The false positive rate

Return type:: float

property tpr: float¶

The true positive rate

Return type:: float

property roc_auc: List[float]¶

The area under the curve of the Receiver operating characteristic for each class

Return type:: List[float]

property roc_thresholds: List[float]¶

The list of thresholds used to calculate the Receiver operating characteristic

Return type:: List[float]

property roc_auc_avg: List[float]¶

The average of each classes’ area under the curve of the Receiver operating characteristic

Return type:: List[float]

property precision: List[List[float]]¶

List of each classes’ precision at various thresholds

Return type:: List[List[float]]

property recall: List[List[float]]¶

List of each classes’ recall at various thresholds

Return type:: List[List[float]]

property confusion_matrix: List[List[float]]¶

Calculated confusion matrix

Return type:: List[List[float]]

calculate(y, y_pred)[source]¶

Calculate the evaluation results

Given the expected y values and corresponding predictions, calculate the various evaluation results

Parameters:

y (Union[ndarray, list]) – 1D array with shape [n_samples] where each entry is the expected class label (aka id) for the corresponding sample e.g. 0 = cat, 1 = dog, 2 = goat, 3 = other
y_pred (Union[ndarray, list]) – 2D array as shape [n_samples, n_classes] for categorical or 1D array as [n_samples] for binary, where each entry contains the model output for the given sample. For binary, the values must be between 0 and 1 where < 0.5 maps to class 0 and >= 0.5 maps to class 1

generate_summary()[source]¶

Generate and return a summary of the results as a string

Return type:: str

generate_plots(show=True, output_dir=None, logger=None)[source]¶

Generate plots of the evaluation results

Parameters:

show – Display the generated plots
output_dir (str) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directory
logger (Logger) – Optional logger

evaluate_autoencoder¶

evaluate_autoencoder(mltk_model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, dump=False, verbose=None, show=False, callbacks=None, update_archive=True)[source]¶

Evaluate a trained auto-encoder model

Parameters:

mltk_model (MltkModel) – MltkModel instance
tflite (bool) – If true then evalute the .tflite (i.e. quantized) model, otherwise evaluate the keras model
weights (str) – Optional weights to load before evaluating (only valid for a keras model)
max_samples_per_class (int) – Maximum number of samples per class to evaluate. This is useful for large datasets
classes (List[str]) – Specific classes to evaluate, if omitted, use the one defined in the given MltkModel, i.e. model specification
dump (bool) – If true, dump the model output of each sample with a side-by-side comparsion to the input sample
verbose (bool) – Enable verbose log messages
show (bool) – Show the evaluation results diagrams
callbacks (list) – Optional callbacks to invoke while evaluating
update_archive (bool) – Update the model archive with the eval results

Return type:

AutoEncoderEvaluationResults

Returns:

Dictionary containing evaluation results

AutoEncoderEvaluationResults¶

class AutoEncoderEvaluationResults[source]¶

Auto-encoder evaluation results

See also

evaluate_autoencoder()
mltk.core.evaluate_model()

__init__(*args, **kwargs)[source]¶

property classes: List[str]¶

List of class labels used by evaluated model

Return type:: List[str]

property overall_accuracy: float¶

The overall, model accuracy

Return type:: float

property overall_precision: List[float]¶

The overall, model precision as various thresholds

Return type:: List[float]

property overall_recall: List[float]¶

The overall, model recall at various thresholds

Return type:: List[float]

property overall_pr_accuracy: float¶

The overall, precision vs recall

Return type:: float

property overall_tpr: List[float]¶

The overall, true positive rate at various thresholds

Return type:: List[float]

property overall_fpr: List[float]¶

The overall, false positive rate at various thresholds

Return type:: List[float]

property overall_roc_auc: List[float]¶

The overall, area under curve of the receiver operating characteristic

Return type:: List[float]

property overall_thresholds: List[float]¶

List of thresholds used to calcuate overall stats

Return type:: List[float]

property class_stats: dict¶

Dictionary of per class statistics

Return type:: dict

calculate(y, y_pred, all_scores, thresholds=None)[source]¶

Calculate the evaluation results

Given the list of expected values and corresponding predicted values with scores, calculate the evaluation metrics.

Parameters:

y (ndarray) – 1D array of expected class ids
y_pred (ndarray) – 1D array of scoring results, e.g. y_pred[i] = scoring_function(x[i], y[i])
all_scores (ndarray) – 2D [n_samples, n_classes] of scores comparing the input vs auto-encoder generated out for each class type (normal, and all abnormal cases)
thresholds (List[float]) – Optional, list of thresholds to use for calculating the TPR, FPR and AUC

generate_summary()[source]¶

Generate and return a summary of the results as a string

Return type:: str

generate_plots(show=True, output_dir=None, logger=None)[source]¶

Generate plots of the evaluation results

Parameters:

show – Display the generated plots
output_dir (str) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directory
logger (Logger) – Optional logger