How to Handle Extremely Imbalanced Datasets
Undersampling One way of handling an imbalanced dataset is to reduce the number of observations from all classes except the minority class. The most well-known algorithm in this group is random undersampling, where samples from the targeted classes are removed at random. These methods can be grouped based on their undersampling strategy into: Prototype generation methods Prototype selection methods Prototype Generation Given an original dataset $S$, prototype generation algorithms will generate a new set $S’$ where $|S’| < |S|$ and $S’ \notin S$. These techniques reduce the number of samples in the targeted classes, but the remaining samples are generated — not selected — from the original set. ...