Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed

José Párraga-Valle; Rodolfo García-Bermúdez; Fernando Rojas; Christian Torres-Morán; Alfredo Simón-Cuevas

Conference Proceedings

Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12108 LNBI 636-646

DOI: 10.1007/978-3-030-45385-5_57

7Citations

11Readers

Get full text

Abstract

The aim of this work was to compare the behavior of mutual information and Chi-square as metrics in the evaluation of the relevance of the terms extracted from documents related to “software design” retrieved from PubMed database tested in two contexts: using a set of terms retrieved from the vectorization of the corpus of abstracts and using only the terms retrieved from the vocabulary defined by the IEEE standard ISO/IEC/IEEE 24765. A search was conducted concerning the subject “software” in the last 6 years and we used Medical Subject Headings (Mesh) term “software design” of the articles to label them. Then mutual information and Chi-square metrics were computed as metrics to sort and select features. Chi-square obtained the highest accuracy scores in documents classification by using a multinomial naive Bayes classifier. Although these results suggest that Chi-square is better than mutual information in feature relevance estimation in the context of this work, further research is necessary to obtain a consistent foundation of this conclusion.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Párraga-Valle, J., García-Bermúdez, R., Rojas, F., Torres-Morán, C., & Simón-Cuevas, A. (2020). Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12108 LNBI, pp. 636–646). Springer. https://doi.org/10.1007/978-3-030-45385-5_57

Readers' Seniority

PhD / Post grad / Masters / Doc 2

100%

Readers' Discipline

Energy 1

33%

Computer Science 1

33%

Economics, Econometrics and Finance 1

33%

Article Metrics

Social Media

Shares, Likes & Comments: 8

View details >

Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed

Abstract

Author supplied keywords

References Powered by Scopus

Feature selection in machine learning: A new perspective

Feature selection for text classification: A review

MeSH Now: Automatic MeSH indexing at PubMed scale via learning to rank

Cited by Powered by Scopus

GramBeddings: A New Neural Network for URL Based Identification of Phishing Web Pages Through N-gram Embeddings

Turkish medical text classification using BERT

ACME: A Classification Model for Explaining the Risk of Preeclampsia Based on Bayesian Network Classifiers and a Non-Redundant Feature Selection Approach

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics