Cleaning out web spam by entropy-based cascade outlier detection

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web spam refers to those Web pages where tricks are played to mislead search engines to increase their rank than they really deserved. It causes huge damages on e-commerce and Web users, and threats the Web security. Combating Web spam is an urgent task. In this paper, Web quality and semantic measurements are integrated with the content and link features to construct a more representative characteristic set. A cascade detection mechanism based on entropy-based outlier mining (EOM) algorithm is proposed. The mechanism consists of three stages with different feature groups. The experiments on WEBSPAM-UK2007 show that the quality and semantic features can effectively improve the detection, and the EOM algorithm outperforms many classic classification algorithms under the circumstance of data unbalanced. The cascade detection mechanism can clean out more spam.

References Powered by Scopus

Neural Networks for Web Content Filtering

100Citations
N/AReaders
Get full text

Developing new fitness functions in genetic programming for classification with unbalanced data

99Citations
N/AReaders
Get full text

Outlier detection using neighborhood rank difference

54Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Less is More: Feature Choosing under Privacy-Preservation for Efficient Web Spam Detection

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wei, S., & Zhu, Y. (2017). Cleaning out web spam by entropy-based cascade outlier detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10439 LNCS, pp. 232–246). Springer Verlag. https://doi.org/10.1007/978-3-319-64471-4_19

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

67%

Professor / Associate Prof. 1

17%

Researcher 1

17%

Readers' Discipline

Tooltip

Computer Science 4

67%

Social Sciences 1

17%

Engineering 1

17%

Save time finding and organizing research with Mendeley

Sign up for free