The utility of information extraction in the classification of books

Tom Betts; Maria Milosavljevic; Jon Oberlander

Conference Proceedings

The utility of information extraction in the classification of books

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4425 LNCS 295-306

DOI: 10.1007/978-3-540-71496-5_28

3Citations

20Readers

Get full text

Abstract

We describe work on automatically assigning classification labels to books using the Library of Congress Classification scheme. This task is non-trivial due to the volume and variety of books that exist. We explore the utility of Information Extraction (IE) techniques within this text categorisation (TC) task, automatically extracting structured information from the full text of books. Experimental evaluation of performance involves a corpus of books from Project Gutenberg. Results indicate that a classifier which combines methods and tools from IE and TC significantly improves over a state-of-the-art text classifier, achieving a classification performance of Fβ=1 = 0.8099. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Betts, T., Milosavljevic, M., & Oberlander, J. (2007). The utility of information extraction in the classification of books. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4425 LNCS, pp. 295–306). Springer Verlag. https://doi.org/10.1007/978-3-540-71496-5_28

The utility of information extraction in the classification of books

Abstract

Author supplied keywords

Cite

Register to see more suggestions