spamBERT
Spam Classification of Email and Sms Texts Using a Fine-Tuned BERT Model
spamBERT (code available here) is a simple user-friendly webpage for spam classification of email and sms texts, using a fine-tuned BERT base model (cased). Leveraging BERT’s contextual understanding of language, the pre-trained model has been fine-tuned using two combined datasets, specific for this purpose, to be used to classify the text given in input by the user.
The dataset
The datasets, SMS Spam Collection and Spam-Ham Dataset are two collection of spam and not-spam SMSs and e-mails. Every sample has two features: the target feature (“ham” or “spam”) and the main feature that contains the text.

spamBERT architecture
The architecture includes the pre-trained BERT base cased (with 12 encoders) and a fully connected layer, fine-tuned in order to perform well on the spam spam classification task. It has been obtained an accuracy score of 99%.
Results
The interface of the webpage is the following:
