Mining the semantic web: Statistical learning for next generation knowledge bases

82Citations
Citations of this article
100Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the SemanticWeb vision of theWorldWideWeb, content will not only be accessible to humans but will also be available in machine interpretable form as ontological knowledge bases. Ontological knowledge bases enable formal querying and reasoning and, consequently, a main research focus has been the investigation of how deductive reasoning can be utilized in ontological representations to enable more advanced applications. However, purely logic methods have not yet proven to be very effective for several reasons: First, there still is the unsolved problem of scalability of reasoning to Web scale. Second, logical reasoning has problems with uncertain information, which is abundant on SemanticWeb data due to its distributed and heterogeneous nature. Third, the construction of ontological knowledge bases suitable for advanced reasoning techniques is complex, which ultimately results in a lack of such expressive real-world data sets with large amounts of instance data. From another perspective, the more expressive structured representations open up new opportunities for data mining, knowledge extraction and machine learning techniques. If moving towards the idea that part of the knowledge already lies in the data, inductive methods appear promising, in particular since inductive methods can inherently handle noisy, inconsistent, uncertain and missing data. While there has been broad coverage of inducing concept structures from less structured sources (text, Web pages), like in ontology learning, given the problems mentioned above, we focus on new methods for dealing with Semantic Web knowledge bases, relying on statistical inference on their standard representations. We argue that machine learning research has to offer a wide variety of methods applicable to different expressivity levels of SemanticWeb knowledge bases: ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web. We specifically cover similarity and distance-based methods, kernel machines, multivariate prediction models, relational graphical models and first-order probabilistic learning approaches and discuss their applicability to Semantic Web representations. Finally, we present selected experimentswhich were conducted on SemanticWebmining tasks for some of the algorithms presented before. This is intended to show the breadth and general potential of this exiting new research and application area for data mining. © The Author(s) 2012.

References Powered by Scopus

Learning the parts of objects by non-negative matrix factorization

11190Citations
N/AReaders
Get full text

The semantic web

10609Citations
N/AReaders
Get full text

Factorization meets the neighborhood: A multifaceted collaborative filtering model

3559Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A review of relational machine learning for knowledge graphs

1169Citations
N/AReaders
Get full text

Semantic Web in data mining and knowledge discovery: A comprehensive survey

279Citations
N/AReaders
Get full text

A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web

64Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Rettinger, A., Lösch, U., Tresp, V., D’Amato, C., & Fanizzi, N. (2012). Mining the semantic web: Statistical learning for next generation knowledge bases. Data Mining and Knowledge Discovery, 24(3), 613–662. https://doi.org/10.1007/s10618-012-0253-2

Readers over time

‘12‘13‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘24‘2505101520

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 52

68%

Professor / Associate Prof. 12

16%

Researcher 8

11%

Lecturer / Post doc 4

5%

Readers' Discipline

Tooltip

Computer Science 64

84%

Engineering 6

8%

Social Sciences 3

4%

Arts and Humanities 3

4%

Save time finding and organizing research with Mendeley

Sign up for free
0