Overview
This project fine-tunes a BERT (Bidirectional Encoder Representations from Transformers) model for fine-grained hate speech detection on social media. Focusing on identity-based targeted hate — including Islamophobic and antisemitic content — the model is trained on labeled datasets and classifies text into hate speech, offensive, or neutral categories.
Features
- Fine-tuned
bert-base-uncasedon hate speech dataset - Multi-class classification: hate speech / offensive / neutral
- Preprocessing pipeline for social media text
- Evaluation with accuracy, F1, precision, and recall
Dataset
This project uses two datasets:
Waseem & Hovy Dataset — cleaned text data from the original hate speech paper by Waseem & Hovy, a benchmark dataset widely used for hate speech detection research.
Annotated Dataset — tweets collected using the Twitter API from 07-01-2021 to 11-29-2021 using a list of hashtags adopted from Waseem & Hovy’s paper. Tweets were manually annotated for hate speech categories.
| File | Description |
|---|---|
data/waseem/ | Cleaned text from Waseem & Hovy |
data/annotated/FINAL_cleaned_annotated.parquet | Full dataframe with tweets and labels |
data/annotated/FINAL_X.txt | Cleaned text |
data/annotated/FINAL_Y.txt | Corresponding labels |
Results
| Model | F1 Score | Precision@F1 | Recall@F1 |
|---|---|---|---|
| BERT fine-tuned - Waseem test dataset | 0.87 | 0.76 | 0.87 |
| BERT fine-tuned - Annotated dataset | 0.92 | 0.91 | 0.93 |
Usage
git clone https://github.com/manitapote/llm-projects
cd llm-projects/bert_hatespeech
pip install -r requirements.txt
python train.py