classification
Source: google-ml-course
Classification
- Logistic regression returns a probability
- This probability can be converted to a binary value by making use of a classification threshold
- Classification/decision threshold: the value which splits between true and false
- s. confusion matrix , which contains information used for the performance metrics
Evaluation metrics
- One possibility: accuracy (fraction of correct predictions over total predictions)
- $\dfrac{\text{TP} + \text{TN}}{\text{all predictions}}$
- Poor metric, especially when working with datasets with class imbalance (significant difference between number of positive and negative labels) e.g. when prediction would almost always return false – accuracy would be close to 100% but this doesn’t say anything about the model!
- Precision and recall
- Bias
Model performance for all possible classification thresholds
Receiver Operating Characteristics (ROC) curve
- Each point on the curve is the TP rate ( recall ) and FP rate at one decision threshold
$$
\begin{align}
\text{TPR} &= \dfrac{\text{TP}}{\text{TP} + \text{FN}}
= \dfrac{\text{TP}}{\text{actual positives}}\
\text{FPR} &= \dfrac{\text{FP}}{\text{FP} + \text{TN}}
= \dfrac{\text{FP}}{\text{actual negatives}}
\end{align}
$$
Area under the ROC Curve (AUC)
- Probability of getting a pairwise (pick a random positive and a random negative) comparison correct, i.e. assigning a higher score to the positive — probability of the model ranking a random positive example higher than a random negative example
- Aggregate measure of performance across all possible decision thresholds (as opposed to calculating TPR and FPR for each threshold in the ROC)
- e.g. probability that a random green point (actual positive) is to the right of a random red point (actual negative)
- perfect ROC with AUC = 1.0
Advantages:
- AUC is scale-invariant, independent on absolute values of the predictions
- AUC is classification-threshold-invariant
Disadvantages:
- When scale invariance is not always desirable (when we need well calibrated probability outputs)
- When classification-threshold-invariance isn’t desired, e.g. when there is a big cost difference between false negatives and false positives
- e.g. in email spam detection, a false positive (legit email marked as spam) is much worse than a false negative (spam comes through)
- we would therefore want to minimise false positives (lower the classification threshold)