Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed

7Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The aim of this work was to compare the behavior of mutual information and Chi-square as metrics in the evaluation of the relevance of the terms extracted from documents related to “software design” retrieved from PubMed database tested in two contexts: using a set of terms retrieved from the vectorization of the corpus of abstracts and using only the terms retrieved from the vocabulary defined by the IEEE standard ISO/IEC/IEEE 24765. A search was conducted concerning the subject “software” in the last 6 years and we used Medical Subject Headings (Mesh) term “software design” of the articles to label them. Then mutual information and Chi-square metrics were computed as metrics to sort and select features. Chi-square obtained the highest accuracy scores in documents classification by using a multinomial naive Bayes classifier. Although these results suggest that Chi-square is better than mutual information in feature relevance estimation in the context of this work, further research is necessary to obtain a consistent foundation of this conclusion.

References Powered by Scopus

Feature selection in machine learning: A new perspective

1584Citations
N/AReaders
Get full text

Feature selection for text classification: A review

266Citations
N/AReaders
Get full text

MeSH Now: Automatic MeSH indexing at PubMed scale via learning to rank

63Citations
N/AReaders
Get full text

Cited by Powered by Scopus

GramBeddings: A New Neural Network for URL Based Identification of Phishing Web Pages Through N-gram Embeddings

32Citations
N/AReaders
Get full text

Turkish medical text classification using BERT

14Citations
N/AReaders
Get full text

ACME: A Classification Model for Explaining the Risk of Preeclampsia Based on Bayesian Network Classifiers and a Non-Redundant Feature Selection Approach

13Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Párraga-Valle, J., García-Bermúdez, R., Rojas, F., Torres-Morán, C., & Simón-Cuevas, A. (2020). Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12108 LNBI, pp. 636–646). Springer. https://doi.org/10.1007/978-3-030-45385-5_57

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

100%

Readers' Discipline

Tooltip

Energy 1

33%

Computer Science 1

33%

Economics, Econometrics and Finance 1

33%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 8

Save time finding and organizing research with Mendeley

Sign up for free