Overview

This project fine-tunes a BERT (Bidirectional Encoder Representations from Transformers) model for fine-grained hate speech detection on social media. Focusing on identity-based targeted hate — including Islamophobic and antisemitic content — the model is trained on labeled datasets and classifies text into hate speech, offensive, or neutral categories.


Features
  • Fine-tuned bert-base-uncased on hate speech dataset
  • Multi-class classification: hate speech / offensive / neutral
  • Preprocessing pipeline for social media text
  • Evaluation with accuracy, F1, precision, and recall

Dataset

This project uses two datasets:

  • Waseem & Hovy Dataset — cleaned text data from the original hate speech paper by Waseem & Hovy, a benchmark dataset widely used for hate speech detection research.

  • Annotated Dataset — tweets collected using the Twitter API from 07-01-2021 to 11-29-2021 using a list of hashtags adopted from Waseem & Hovy’s paper. Tweets were manually annotated for hate speech categories.

FileDescription
data/waseem/Cleaned text from Waseem & Hovy
data/annotated/FINAL_cleaned_annotated.parquetFull dataframe with tweets and labels
data/annotated/FINAL_X.txtCleaned text
data/annotated/FINAL_Y.txtCorresponding labels

Results
ModelF1 ScorePrecision@F1Recall@F1
BERT fine-tuned - Waseem test dataset0.870.760.87
BERT fine-tuned - Annotated dataset0.920.910.93

Usage
git clone https://github.com/manitapote/llm-projects
cd llm-projects/bert_hatespeech
pip install -r requirements.txt
python train.py

Code