evaluate_model¶
- evaluate_model(model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, dump=False, show=False, verbose=None, callbacks=None, update_archive=True, test=False, post_process=False)[source]¶
Evaluate a trained model
This internally calls:
based on the given
mltk.core.MltkModel
instance.- Parameters:
model (
Union
[MltkModel
,str
]) –mltk.core.MltkModel
instance, name of MLTK model, path to model archive.mltk.zip
or model specification script.py
tflite (
bool
) – If True, evaluate the.tflite
(i.e. quantized) model file. If False, evaluate the Keras``.h5`` model (i.e. float)weights (
str
) –Optional, load weights from previous training session. May be one of the following:
If option omitted then evaluate using output .h5 or .tflite from training
Absolute path to a generated weights .h5 file generated by Keras during training
The keyword
best
; find the best weights in <model log dir>/train/weightsFilename of .h5 in <model log dir>/train/weights
Note: This option may only be used if the “–tflite” option is not used
max_samples_per_class (
int
) – By default, all validation samples are used. This option places an upper limit on the number of samples per class that are used for evaluationclasses (
List
[str
]) – If evaluating a model with themltk.core.EvaluateAutoEncoderMixin
, then this should be a comma-seperated list of classes in the dataset. The first element should be considered the “normal” class, every other class is considered abnormal and compared independently. If not provided, then the classes default to: [normal, abnormal]dump (
bool
) – If evaluating a model with themltk.core.EvaluateAutoEncoderMixin
, then, for each sample, an image will be generated comparing the sample to the decoded sampleshow (
bool
) – Display the generated performance diagramsverbose (
bool
) – Enable verbose console logscallbacks (
List
) – List of Keras callbacks to use for evaluationupdate_archive (
bool
) – Update the model archive with the evaluation resultstest (
bool
) – Optional, load the model in “test mode” if true.post_process (
bool
) – This allows for post-processing the evaluation results (e.g. uploading to a cloud) if supported by the given MltkModel
- Return type:
- Returns:
Dictionary of evaluation results
evaluate_classifier¶
- class EvaluationResults[source]¶
Holds model evaluation results
Note
The Implementation details are specific to the model type
- property name: str¶
The name of the evaluated model
- Return type:
str
- property model_type: str¶
The type of the evaluated model (e.g. classification, autoencoder, etc.)
- Return type:
str
- generate_summary(include_all=True)[source]¶
Generate and return a summary of the results as a string
- Return type:
str
- generate_plots(show=True, output_dir=None, logger=None)[source]¶
Generate plots of the evaluation results
- Parameters:
show – Display the generated plots
output_dir (
str
) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directorylogger (
Logger
) – Optional logger
evaluate_classifier¶
- evaluate_classifier(mltk_model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, verbose=False, show=False, update_archive=True, **kwargs)[source]¶
Evaluate a trained classification model
- Parameters:
mltk_model (
MltkModel
) – MltkModel instancetflite (
bool
) – If true then evalute the .tflite (i.e. quantized) model, otherwise evaluate the keras modelweights (
str
) – Optional weights to load before evaluating (only valid for a keras model)max_samples_per_class (
int
) – Maximum number of samples per class to evaluate. This is useful for large datasetsclasses (
List
[str
]) – Specific classes to evaluateverbose (
bool
) – Enable progress barshow (
bool
) – Show the evaluation results diagramsupdate_archive (
bool
) – Update the model archive with the eval results
- Return type:
- Returns:
Dictionary containing evaluation results
ClassifierEvaluationResults¶
- class ClassifierEvaluationResults[source]¶
Classifier evaluation results
- property classes: List[str]¶
List of class labels used by evaluated model
- Return type:
List
[str
]
- property overall_accuracy: float¶
The overall, model accuracy
- Return type:
float
- property class_accuracies: List[float]¶
List of each classes’ accuracy
- Return type:
List
[float
]
- property false_positive_rate: float¶
The false positive rate
- Return type:
float
- property fpr: float¶
The false positive rate
- Return type:
float
- property tpr: float¶
The true positive rate
- Return type:
float
- property roc_auc: List[float]¶
The area under the curve of the Receiver operating characteristic for each class
- Return type:
List
[float
]
- property roc_thresholds: List[float]¶
The list of thresholds used to calculate the Receiver operating characteristic
- Return type:
List
[float
]
- property roc_auc_avg: List[float]¶
The average of each classes’ area under the curve of the Receiver operating characteristic
- Return type:
List
[float
]
- property precision: List[List[float]]¶
List of each classes’ precision at various thresholds
- Return type:
List
[List
[float
]]
- property recall: List[List[float]]¶
List of each classes’ recall at various thresholds
- Return type:
List
[List
[float
]]
- property confusion_matrix: List[List[float]]¶
Calculated confusion matrix
- Return type:
List
[List
[float
]]
- calculate(y, y_pred)[source]¶
Calculate the evaluation results
Given the expected y values and corresponding predictions, calculate the various evaluation results
- Parameters:
y (
Union
[ndarray
,list
]) – 1D array with shape [n_samples] where each entry is the expected class label (aka id) for the corresponding sample e.g. 0 = cat, 1 = dog, 2 = goat, 3 = othery_pred (
Union
[ndarray
,list
]) – 2D array as shape [n_samples, n_classes] for categorical or 1D array as [n_samples] for binary, where each entry contains the model output for the given sample. For binary, the values must be between 0 and 1 where < 0.5 maps to class 0 and >= 0.5 maps to class 1
- generate_summary()[source]¶
Generate and return a summary of the results as a string
- Return type:
str
- generate_plots(show=True, output_dir=None, logger=None)[source]¶
Generate plots of the evaluation results
- Parameters:
show – Display the generated plots
output_dir (
str
) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directorylogger (
Logger
) – Optional logger
evaluate_autoencoder¶
- evaluate_autoencoder(mltk_model, tflite=False, weights=None, max_samples_per_class=-1, classes=None, dump=False, verbose=None, show=False, callbacks=None, update_archive=True)[source]¶
Evaluate a trained auto-encoder model
- Parameters:
mltk_model (
MltkModel
) – MltkModel instancetflite (
bool
) – If true then evalute the .tflite (i.e. quantized) model, otherwise evaluate the keras modelweights (
str
) – Optional weights to load before evaluating (only valid for a keras model)max_samples_per_class (
int
) – Maximum number of samples per class to evaluate. This is useful for large datasetsclasses (
List
[str
]) – Specific classes to evaluate, if omitted, use the one defined in the given MltkModel, i.e. model specificationdump (
bool
) – If true, dump the model output of each sample with a side-by-side comparsion to the input sampleverbose (
bool
) – Enable verbose log messagesshow (
bool
) – Show the evaluation results diagramscallbacks (
list
) – Optional callbacks to invoke while evaluatingupdate_archive (
bool
) – Update the model archive with the eval results
- Return type:
- Returns:
Dictionary containing evaluation results
AutoEncoderEvaluationResults¶
- class AutoEncoderEvaluationResults[source]¶
Auto-encoder evaluation results
- property classes: List[str]¶
List of class labels used by evaluated model
- Return type:
List
[str
]
- property overall_accuracy: float¶
The overall, model accuracy
- Return type:
float
- property overall_precision: List[float]¶
The overall, model precision as various thresholds
- Return type:
List
[float
]
- property overall_recall: List[float]¶
The overall, model recall at various thresholds
- Return type:
List
[float
]
- property overall_pr_accuracy: float¶
The overall, precision vs recall
- Return type:
float
- property overall_tpr: List[float]¶
The overall, true positive rate at various thresholds
- Return type:
List
[float
]
- property overall_fpr: List[float]¶
The overall, false positive rate at various thresholds
- Return type:
List
[float
]
- property overall_roc_auc: List[float]¶
The overall, area under curve of the receiver operating characteristic
- Return type:
List
[float
]
- property overall_thresholds: List[float]¶
List of thresholds used to calcuate overall stats
- Return type:
List
[float
]
- property class_stats: dict¶
Dictionary of per class statistics
- Return type:
dict
- calculate(y, y_pred, all_scores, thresholds=None)[source]¶
Calculate the evaluation results
Given the list of expected values and corresponding predicted values with scores, calculate the evaluation metrics.
- Parameters:
y (
ndarray
) – 1D array of expected class idsy_pred (
ndarray
) – 1D array of scoring results, e.g. y_pred[i] = scoring_function(x[i], y[i])all_scores (
ndarray
) – 2D [n_samples, n_classes] of scores comparing the input vs auto-encoder generated out for each class type (normal, and all abnormal cases)thresholds (
List
[float
]) – Optional, list of thresholds to use for calculating the TPR, FPR and AUC
- generate_summary()[source]¶
Generate and return a summary of the results as a string
- Return type:
str
- generate_plots(show=True, output_dir=None, logger=None)[source]¶
Generate plots of the evaluation results
- Parameters:
show – Display the generated plots
output_dir (
str
) – Generate the plots at the specified directory. If omitted, generated in the model’s logging directorylogger (
Logger
) – Optional logger