Hierarchical classification of web content

677Citations
Citations of this article
356Readers
Mendeley users who have this article in their library.

Abstract

This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train different second-level classifiers. In the hierarchical case, a model is learned to distinguish a second-level category from other categories within the same top level. In the flat non-hierarchical case; a model distinguishes a second-level category from all other second-level categories. Scoring rules can further take advantage of the hierarchy by considering only second-level categories that exceed a threshold at the top level. We use support vector machine (SVM) classifiers, which have been shown to be efficient and effective for classification, but not previously explored in the context of hierarchical classification. We found small advantages in accuracy for hierarchical models over flat models. For the hierarchical approach, we found the same accuracy using a sequential Boolean decision rule and a multiplicative decision rule. Since the sequential approach is much more efficient, requiring only 14%-16% of the comparisons used in the other approaches, we find it to be a good choice for classifying text into large hierarchical structures.

Cited by Powered by Scopus

Machine Learning in Automated Text Categorization

6049Citations
N/AReaders
Get full text

A survey of text classification algorithms

1368Citations
N/AReaders
Get full text

A survey of hierarchical classification across different application domains

876Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Dumais, S., & Chen, H. (2000). Hierarchical classification of web content. In SIGIR Forum (ACM Special Interest Group on Information Retrieval) (pp. 256–263). ACM. https://doi.org/10.1145/345508.345593

Readers over time

‘07‘09‘10‘11‘12‘13‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘24‘25020406080

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 175

64%

Researcher 58

21%

Professor / Associate Prof. 29

11%

Lecturer / Post doc 11

4%

Readers' Discipline

Tooltip

Computer Science 227

85%

Engineering 23

9%

Social Sciences 10

4%

Agricultural and Biological Sciences 8

3%

Save time finding and organizing research with Mendeley

Sign up for free
0