evaluate_model

evaluate_model(model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, dump=False, show=False, verbose=None, callbacks=None, update_archive=True, test=False, post_process=False)[source]

Evaluate a trained model

This internally calls:

based on the given mltk.core.MltkModel instance.

Parameters:
  • model (Union[MltkModel, str]) – mltk.core.MltkModel instance, name of MLTK model, path to model archive .mltk.zip or model specification script .py

  • tflite (bool) – If True, evaluate the .tflite (i.e. quantized) model file. If False, evaluate the Keras``.h5`` model (i.e. float)

  • weights (str) –

    Optional, load weights from previous training session. May be one of the following:

    • If option omitted then evaluate using output .h5 or .tflite from training

    • Absolute path to a generated weights .h5 file generated by Keras during training

    • The keyword best; find the best weights in <model log dir>/train/weights

    • Filename of .h5 in <model log dir>/train/weights

    Note: This option may only be used if the “–tflite” option is not used

  • max_samples_per_class (int) – By default, all validation samples are used. This option places an upper limit on the number of samples per class that are used for evaluation

  • classes (List[str]) – If evaluating a model with the mltk.core.EvaluateAutoEncoderMixin, then this should be a comma-seperated list of classes in the dataset. The first element should be considered the “normal” class, every other class is considered abnormal and compared independently. If not provided, then the classes default to: [normal, abnormal]

  • dump (bool) – If evaluating a model with the mltk.core.EvaluateAutoEncoderMixin, then, for each sample, an image will be generated comparing the sample to the decoded sample

  • show (bool) – Display the generated performance diagrams

  • verbose (bool) – Enable verbose console logs

  • callbacks (List) – List of Keras callbacks to use for evaluation

  • update_archive (bool) – Update the model archive with the evaluation results

  • test (bool) – Optional, load the model in “test mode” if true.

  • post_process (bool) – This allows for post-processing the evaluation results (e.g. uploading to a cloud) if supported by the given MltkModel

Return type:

EvaluationResults

Returns:

Dictionary of evaluation results

evaluate_classifier

class EvaluationResults[source]

Holds model evaluation results

Note

The Implementation details are specific to the model type

__init__(name, model_type='generic', **kwargs)[source]
Parameters:
  • name (str) –

  • model_type (str) –

property name: str

The name of the evaluated model

Return type:

str

property model_type: str

The type of the evaluated model (e.g. classification, autoencoder, etc.)

Return type:

str

generate_summary(include_all=True)[source]

Generate and return a summary of the results as a string

Return type:

str

generate_plots(show=True, output_dir=None, logger=None)[source]

Generate plots of the evaluation results

Parameters:
  • show – Display the generated plots

  • output_dir (str) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directory

  • logger (Logger) – Optional logger

evaluate_classifier

evaluate_classifier(mltk_model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, verbose=False, show=False, update_archive=True, **kwargs)[source]

Evaluate a trained classification model

Parameters:
  • mltk_model (MltkModel) – MltkModel instance

  • tflite (bool) – If true then evalute the .tflite (i.e. quantized) model, otherwise evaluate the keras model

  • weights (str) – Optional weights to load before evaluating (only valid for a keras model)

  • max_samples_per_class (int) – Maximum number of samples per class to evaluate. This is useful for large datasets

  • classes (List[str]) – Specific classes to evaluate

  • verbose (bool) – Enable progress bar

  • show (bool) – Show the evaluation results diagrams

  • update_archive (bool) – Update the model archive with the eval results

Return type:

ClassifierEvaluationResults

Returns:

Dictionary containing evaluation results

ClassifierEvaluationResults

class ClassifierEvaluationResults[source]

Classifier evaluation results

__init__(*args, **kwargs)[source]
property classes: List[str]

List of class labels used by evaluated model

Return type:

List[str]

property overall_accuracy: float

The overall, model accuracy

Return type:

float

property class_accuracies: List[float]

List of each classes’ accuracy

Return type:

List[float]

property false_positive_rate: float

The false positive rate

Return type:

float

property fpr: float

The false positive rate

Return type:

float

property tpr: float

The true positive rate

Return type:

float

property roc_auc: List[float]

The area under the curve of the Receiver operating characteristic for each class

Return type:

List[float]

property roc_thresholds: List[float]

The list of thresholds used to calculate the Receiver operating characteristic

Return type:

List[float]

property roc_auc_avg: List[float]

The average of each classes’ area under the curve of the Receiver operating characteristic

Return type:

List[float]

property precision: List[List[float]]

List of each classes’ precision at various thresholds

Return type:

List[List[float]]

property recall: List[List[float]]

List of each classes’ recall at various thresholds

Return type:

List[List[float]]

property confusion_matrix: List[List[float]]

Calculated confusion matrix

Return type:

List[List[float]]

calculate(y, y_pred)[source]

Calculate the evaluation results

Given the expected y values and corresponding predictions, calculate the various evaluation results

Parameters:
  • y (Union[ndarray, list]) – 1D array with shape [n_samples] where each entry is the expected class label (aka id) for the corresponding sample e.g. 0 = cat, 1 = dog, 2 = goat, 3 = other

  • y_pred (Union[ndarray, list]) – 2D array as shape [n_samples, n_classes] for categorical or 1D array as [n_samples] for binary, where each entry contains the model output for the given sample. For binary, the values must be between 0 and 1 where < 0.5 maps to class 0 and >= 0.5 maps to class 1

generate_summary()[source]

Generate and return a summary of the results as a string

Return type:

str

generate_plots(show=True, output_dir=None, logger=None)[source]

Generate plots of the evaluation results

Parameters:
  • show – Display the generated plots

  • output_dir (str) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directory

  • logger (Logger) – Optional logger

evaluate_autoencoder

evaluate_autoencoder(mltk_model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, dump=False, verbose=None, show=False, callbacks=None, update_archive=True)[source]

Evaluate a trained auto-encoder model

Parameters:
  • mltk_model (MltkModel) – MltkModel instance

  • tflite (bool) – If true then evalute the .tflite (i.e. quantized) model, otherwise evaluate the keras model

  • weights (str) – Optional weights to load before evaluating (only valid for a keras model)

  • max_samples_per_class (int) – Maximum number of samples per class to evaluate. This is useful for large datasets

  • classes (List[str]) – Specific classes to evaluate, if omitted, use the one defined in the given MltkModel, i.e. model specification

  • dump (bool) – If true, dump the model output of each sample with a side-by-side comparsion to the input sample

  • verbose (bool) – Enable verbose log messages

  • show (bool) – Show the evaluation results diagrams

  • callbacks (list) – Optional callbacks to invoke while evaluating

  • update_archive (bool) – Update the model archive with the eval results

Return type:

AutoEncoderEvaluationResults

Returns:

Dictionary containing evaluation results

AutoEncoderEvaluationResults

class AutoEncoderEvaluationResults[source]

Auto-encoder evaluation results

__init__(*args, **kwargs)[source]
property classes: List[str]

List of class labels used by evaluated model

Return type:

List[str]

property overall_accuracy: float

The overall, model accuracy

Return type:

float

property overall_precision: List[float]

The overall, model precision as various thresholds

Return type:

List[float]

property overall_recall: List[float]

The overall, model recall at various thresholds

Return type:

List[float]

property overall_pr_accuracy: float

The overall, precision vs recall

Return type:

float

property overall_tpr: List[float]

The overall, true positive rate at various thresholds

Return type:

List[float]

property overall_fpr: List[float]

The overall, false positive rate at various thresholds

Return type:

List[float]

property overall_roc_auc: List[float]

The overall, area under curve of the receiver operating characteristic

Return type:

List[float]

property overall_thresholds: List[float]

List of thresholds used to calcuate overall stats

Return type:

List[float]

property class_stats: dict

Dictionary of per class statistics

Return type:

dict

calculate(y, y_pred, all_scores, thresholds=None)[source]

Calculate the evaluation results

Given the list of expected values and corresponding predicted values with scores, calculate the evaluation metrics.

Parameters:
  • y (ndarray) – 1D array of expected class ids

  • y_pred (ndarray) – 1D array of scoring results, e.g. y_pred[i] = scoring_function(x[i], y[i])

  • all_scores (ndarray) – 2D [n_samples, n_classes] of scores comparing the input vs auto-encoder generated out for each class type (normal, and all abnormal cases)

  • thresholds (List[float]) – Optional, list of thresholds to use for calculating the TPR, FPR and AUC

generate_summary()[source]

Generate and return a summary of the results as a string

Return type:

str

generate_plots(show=True, output_dir=None, logger=None)[source]

Generate plots of the evaluation results

Parameters:
  • show – Display the generated plots

  • output_dir (str) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directory

  • logger (Logger) – Optional logger