ROC curve comparison of models

Given that the models’ performance is similar for F1 scores, we draw some Receiver Operating Characteristic (ROC) curves to understand these models’ performance at low false-positive rates.

ROC curves are explained in ¹. These curves are useful to understand classification performance across a range of false-positive rates.

² takes the predicted probabilities of a job description being fraudulent (in the test set) and extracts a ROC curve from them. Four models are plotted: BOW + FCNN (³), LSTM with inline embedding (⁴), Transformer with position embedding (⁵), and BOW + Logistic Regression (⁶). The results are in the figure below:

ROC Curves

We observe that:

even at low false-positive rates, the performance of the four models remains close
the LSTM model does slightly better than the other three models at most (low) false-positive rate thresholds
the Logistic regression model does quite nicely despite its simplicity

References

Receiver Operating Characteristic ↩
ROC curves creation ↩
Bag-of-words with a fully-connected neural network model ↩
LSTM model with a word-embedding layer ↩
Transformer model ↩
Logistic regression plus bag-of-words model ↩