We describe work on automatically assigning classification labels to books using the Library of Congress Classification scheme. This task is non-trivial due to the volume and variety of books that exist. We explore the utility of Information Extraction (IE) techniques within this text categorisation (TC) task, automatically extracting structured information from the full text of books. Experimental evaluation of performance involves a corpus of books from Project Gutenberg. Results indicate that a classifier which combines methods and tools from IE and TC significantly improves over a state-of-the-art text classifier, achieving a classification performance of Fβ=1 = 0.8099. © Springer-Verlag Berlin Heidelberg 2007.
CITATION STYLE
Betts, T., Milosavljevic, M., & Oberlander, J. (2007). The utility of information extraction in the classification of books. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4425 LNCS, pp. 295–306). Springer Verlag. https://doi.org/10.1007/978-3-540-71496-5_28
Mendeley helps you to discover research relevant for your work.