A composite of char and token-based models

We continue our experimentation by creating a neural network model composed from a character and a token (word) based model,

Inspiration for the model

Previously, we experimented with a character-based model that used CNN to extract features from the data (¹, ²).

We also experimented with an LSTM model with a learned embedding (¹) on tokens. (³) suggests an alternate way to extract features from tokens and signals from a sequence of features - via a CNN block followed by an LSTM block.

(⁴) documents a composite model that combines the character-based model with a token-based CNN-LSTM model.

Composite model implementation

We implement the composite model (⁵) described in (⁴). The diagram below shows the network.

The token part of the model makes a few adjustments to the model in (⁶): in particular, we hold the maximum number of allowed tokens to 50,000 to keep memory consumption in check.

The character part of the model makes a few adjustments to the one in (¹) - to simplify the implementation, we encode characters previously prepared for the token portion of the model, rather than the raw text.

Composite Model

Model performance

We observe an accuracy score of 98.43%, an F1 score of 83.85%, and an AUC score of 96.72%.

Inspiration for the model

Composite model implementation

Model performance

References