Classification Metrics

This package allows you to use a variety of Classification metrics for the performance analysis of Classification models based on the provided y_true and y_pred. The metrics that you choose to evaluate your machine learning model is very important. Choice of metrics influences how the performance of machine learning algorithms is measured and compared. For most of these function, it is expected that the provided


binary_accuracy(y_pred, y_true; threshold=0.5)

Calculates Averaged Binary Accuracy based on y_pred and y_true. Argument threshold is used to specify the minimum predicted probability y_pred required to be labelled as 1. Default value set as 0.5.

categorical_accuracy(y_pred, y_true)

Calculates Averaged Categorical Accuracy based on y_pred and y_true.

cohen_kappa(y_pred, y_true)

Measures the agreement between two raters (predicted and ground truth, here) who each classify N items into C mutually exclusive categories, using the observed data to calculate the probabilities of each observer randomly seeing each category. If the raters are in complete agreement then κ = 1. If there is no agreement among the raters other than what would be expected by chance, κ = 0.

Ref: Cohen's Kappa

confusion_matrix(y_pred, y_true)

Function to create a confusionmatrix for classification problems based on provided `ypredandytrue. Expectsytrue`, to be onehot_enocded already.

f_beta_score(y_pred, y_true; β=1, avg_type="macro", sample_weights=nothing)

Compute fbeta score. The F_beta score is the weighted harmonic mean of precision and recall, reaching its optimal value at 1 and its worst value at 0.


  • y_pred: predicted values.
  • y_true: ground truth values on the basis of which predicted values are to be assessed.
  • β=1: the weight of precision in the combined score. If β<1, more weight given to precision, while β>1 favors recall.
  • avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
  • sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.
false_alarm_rate(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the falsealarmraye of the predictions with respect to the labels as 1 - specificity(y_pred, y_true, avg_type, sample_weights)


  • y_pred: predicted values.
  • y_true: ground truth values on the basis of which predicted values are to be assessed.
  • avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
  • sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.

See also: specificity

precision(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the precision of the predictions with respect to the labels.


  • y_pred: predicted values.
  • y_true: ground truth values on the basis of which predicted values are to be assessed.
  • avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
  • sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.
recall(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the recall of the predictions with respect to the labels.


  • y_pred: predicted values.
  • y_true: ground truth values on the basis of which predicted values are to be assessed.
  • avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
  • sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.

Aliases: sensitivity and detection_rate

sparse_categorical(y_pred, y_true)

Calculated Sparse Categorical Accuracy based on y_pred and y_true. It evaluates the maximal true value is equal to the index of the maximal predicted value. Here, y_true is expected to provide only an integer (start from 0 index) as label for each data element (ie. not one hot encoded).

specificity(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the specificity of the predictions with respect to the labels.


  • y_pred: predicted values.
  • y_true: ground truth values on the basis of which predicted values are to be assessed.
  • avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
  • sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.
top_k_categorical(y_pred, y_true; k=3)

Evaluates if the index of true value is equal to any of the indices of top k predicted values. Default value of k set to 3.

top_k_sparse_categorical(y_pred, y_true; k=3)

Evaluates if the true value is equal to any of the indices of top k predicted values. Default value of k set to 3. Similar to sparse_categorical, expects the y_true to provide only an integer (start from 0 index) as label for each data element (ie. not one hot encoded).

Combined Stats

There are some functions that return you the overall analysis of the model performance within a single function. They are:

statsfromTFPN(TP, TN, FP, FN)

Computes statistics in case of binary classification or one-vs-all statsitics in case of multiclass classification.


  • TP: true positive values
  • TN: true negative values
  • FP: false positive values
  • FN: false negative values

Return the result stats as a dictionary.

classwise_stats(y_pred, y_true)

Computes statistics for each of the class for multiclass classification based on provided y_pred and y_true.

Return the result stats as a dictionary.

global_stats(y_pred, y_true; avg_type="macro")

Computes the overall statistics based on provided y_pred and y_true. avg_type allows to specify the type of average to be used while evaluating the stats. Currently, it can take values as "macro" or "micro".

Return the result stats as a dictionary.


These are some utility functions to aid the overall performance analysis.

bin_to_cat(y_pred, y_true)

Function to convert binary type of data to categorical with two categories. Return y_pred and y_true of shape (2, length(y_pred)) as tuple. Utility function to support performance metrics like Precision, Recall etc, where the function first need to be converted to categorical form before applying metric.

TFPN(y_pred, y_true)

Returns Confusion Matrix and True Positive, True Negative, False Positive and False Negative for each class based on y_pred and y_true. Expects y_true, to be onehot_enocded already.