An Improved Detection of Cyberbullying on Social Media Using Randomized Sampling

Saved in:
Bibliographic Details
Title: An Improved Detection of Cyberbullying on Social Media Using Randomized Sampling
Language: English
Authors: Nitasha Dhingra, Suhani Chawla, Oshin Saini, Rishabh Kaushal (ORCID 0000-0002-9200-7802)
Source: International Journal of Bullying Prevention. 2025 7(3):166-178.
Availability: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed: Y
Page Count: 13
Publication Date: 2025
Document Type: Journal Articles
Reports - Research
Descriptors: Bullying, Social Media, Computer Mediated Communication, Context Effect, Classification, Identification
DOI: 10.1007/s42380-023-00188-4
ISSN: 2523-3653
2523-3661
Abstract: Due to the pandemic, the world's dependence shifted to online platforms. It has made all age groups vulnerable to cyberbullying. Now more than ever, there is a need for online behavior monitoring. Existing algorithms tend to classify friendly banter as cyberbullying. They make use of binary classification by identifying offensive keywords. The lack of analysis of the context of data posted and the unavailability of public training data makes it challenging to train models accurately. Our models and research focus on the larger picture by making use of context as a significant parameter during the classification. The dataset chosen was such that its annotation was based on 5 parameters that considered the context of conversations happening online. This paper executes various machine learning algorithms, SVM, random forest, AdaBoost, and MLP algorithms, on a benchmark cyberbullying-representations dataset extracted from Twitter. We conducted randomized oversampling on the best-performing SVM model, which resulted in a significantly higher average F1 score outperforming the baseline score.
Abstractor: As Provided
Entry Date: 2026
Accession Number: EJ1492569
Database: ERIC
Description
Abstract:Due to the pandemic, the world's dependence shifted to online platforms. It has made all age groups vulnerable to cyberbullying. Now more than ever, there is a need for online behavior monitoring. Existing algorithms tend to classify friendly banter as cyberbullying. They make use of binary classification by identifying offensive keywords. The lack of analysis of the context of data posted and the unavailability of public training data makes it challenging to train models accurately. Our models and research focus on the larger picture by making use of context as a significant parameter during the classification. The dataset chosen was such that its annotation was based on 5 parameters that considered the context of conversations happening online. This paper executes various machine learning algorithms, SVM, random forest, AdaBoost, and MLP algorithms, on a benchmark cyberbullying-representations dataset extracted from Twitter. We conducted randomized oversampling on the best-performing SVM model, which resulted in a significantly higher average F1 score outperforming the baseline score.
ISSN:2523-3653
2523-3661
DOI:10.1007/s42380-023-00188-4