Microblogs have become popular media platforms for reporting and propagating news. However, they also enable the proliferation of misleading information that can cause serious damage. Thus, many efforts have been taken to defeat rumors automatically. While several innovative solutions for rumor detection and classification have been developed, the lack of comprehensive and labeled datasets remains a major limitation. Existing datasets are scarce and none of them provide all of the features that have proven to be effective for rumor analysis. To mitigate this problem, we propose a big data-sized dataset called DAT@Z21, which provides news contents with rich features including textual contents, social context, social engagement of users and spatiotemporal information. Furthermore, DAT@Z21 also provides visual contents, i.e., images, which play a crucial role in the news diffusion process. We conduct exploratory analyses to understand our dataset’s characteristics and analyze useful patterns. We also experiment various state-of-the-art rumor classification methods to illustrate DAT@Z21’s usefulness, especially its visual components. Eventually, DAT@Z21 is available online at https://git.msh-lse.fr/eric/dataz21.
CITATION STYLE
Azri, A., Favre, C., Harbi, N., Darmont, J., & Noûs, C. (2023). DAT@Z21: A Comprehensive Multimodal Dataset for Rumor Classification in Microblogs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14148 LNCS, pp. 161–175). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-39831-5_16
Mendeley helps you to discover research relevant for your work.