Experiments on the use of feature selection and negative evidence in automated text categorization

Luigi Galavotti; Fabrizio Sebastiani; Maria Simi

Conference Proceedings

Experiments on the use of feature selection and negative evidence in automated text categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1923 59-68

DOI: 10.1007/3-540-45268-0_6

146Citations

56Readers

Get full text

Abstract

We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r' « r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X2 statistics. Classifier induction refers instead to the problem of automatically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark

Cite

CITATION STYLE

APA

Galavotti, L., Sebastiani, F., & Simi, M. (2000). Experiments on the use of feature selection and negative evidence in automated text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1923, pp. 59–68). Springer Verlag. https://doi.org/10.1007/3-540-45268-0_6

Experiments on the use of feature selection and negative evidence in automated text categorization

Abstract

Cite

Register to see more suggestions