A Test Dataset of Offensive Malay Language by a Cyberbullying Detection Model on Instagram Using Support Vector Machine

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Social media services have become a prevalent communication tool due to their capability to instantly share information with a large number of people for free. However, social media also facilitate cyberbullying, and studies have shown that cyberbullying on social media has a severe impact compared to other platforms. In some cases, cyberbullying provokes tragic problems, such as suicide. The information shared on social media services provides a massive amount of textual data, which can be used to explore patterns of human behaviors including cyberbullying. This paper aims to build a dataset of offensive language for research on cyberbullying in the Malay language through a series of baseline experiments by implementing SVM classifiers. These preliminary experiments helped to understand the performance of automatic tools that mine for abusive language within a corpus of Malay texts. To achieve the objectives, social media extraction methods and new crawling technologies oriented have been developed to monitor the Instagram accounts of popular Malaysian celebrities. The resulting collection contains 165,239 real-world comments associated with 27 Instagram public accounts. A sample of this corpus was manually labelled in terms of cyberbullying categories. After the dataset was cleaned, normalized, and vectorized, this led to a collection of 527 comments. Following a standard training (70%) and test (30%) split, the SVM classifier was developed and evaluated. These initial experiments produced a model accuracy of 75% and f1-scores of around 75%.

Cite

CITATION STYLE

APA

Ismail, N., Losada, D. E., & Ahmad, R. (2024). A Test Dataset of Offensive Malay Language by a Cyberbullying Detection Model on Instagram Using Support Vector Machine. In Communications in Computer and Information Science (Vol. 2001 CCIS, pp. 182–192). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-9589-9_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free