Dataset

Coordinated Reply Attacks in Influence Operations: Characterization and Detection

This paper presents a machine learning framework to detect tweets that get coordinated replies as well as repliers involved in such attacks.

June 2025 · Manita Pote*, Tuğrulcan Elmas, Alessandro Flammini, Filippo Menczer

How to Handle Extremely Imbalanced Datasets

Undersampling One way of handling an imbalanced dataset is to reduce the number of observations from all classes except the minority class. The most well-known algorithm in this group is random undersampling, where samples from the targeted classes are removed at random. These methods can be grouped based on their undersampling strategy into: Prototype generation methods Prototype selection methods Prototype Generation Given an original dataset $S$, prototype generation algorithms will generate a new set $S’$ where $|S’| < |S|$ and $S’ \notin S$. These techniques reduce the number of samples in the targeted classes, but the remaining samples are generated — not selected — from the original set. ...

January 2024 · Manita Pote