Mining the semantic web: Statistical learning for next generation knowledge bases

Achim Rettinger; Uta Lösch; Volker Tresp; Claudia D'Amato; Nicola Fanizzi

Journal Article

Mining the semantic web: Statistical learning for next generation knowledge bases

Data Mining and Knowledge Discovery (2012) 24(3) 613-662

DOI: 10.1007/s10618-012-0253-2

82Citations

100Readers

Get full text

Abstract

In the SemanticWeb vision of theWorldWideWeb, content will not only be accessible to humans but will also be available in machine interpretable form as ontological knowledge bases. Ontological knowledge bases enable formal querying and reasoning and, consequently, a main research focus has been the investigation of how deductive reasoning can be utilized in ontological representations to enable more advanced applications. However, purely logic methods have not yet proven to be very effective for several reasons: First, there still is the unsolved problem of scalability of reasoning to Web scale. Second, logical reasoning has problems with uncertain information, which is abundant on SemanticWeb data due to its distributed and heterogeneous nature. Third, the construction of ontological knowledge bases suitable for advanced reasoning techniques is complex, which ultimately results in a lack of such expressive real-world data sets with large amounts of instance data. From another perspective, the more expressive structured representations open up new opportunities for data mining, knowledge extraction and machine learning techniques. If moving towards the idea that part of the knowledge already lies in the data, inductive methods appear promising, in particular since inductive methods can inherently handle noisy, inconsistent, uncertain and missing data. While there has been broad coverage of inducing concept structures from less structured sources (text, Web pages), like in ontology learning, given the problems mentioned above, we focus on new methods for dealing with Semantic Web knowledge bases, relying on statistical inference on their standard representations. We argue that machine learning research has to offer a wide variety of methods applicable to different expressivity levels of SemanticWeb knowledge bases: ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web. We specifically cover similarity and distance-based methods, kernel machines, multivariate prediction models, relational graphical models and first-order probabilistic learning approaches and discuss their applicability to Semantic Web representations. Finally, we present selected experimentswhich were conducted on SemanticWebmining tasks for some of the algorithms presented before. This is intended to show the breadth and general potential of this exiting new research and application area for data mining. © The Author(s) 2012.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Rettinger, A., Lösch, U., Tresp, V., D’Amato, C., & Fanizzi, N. (2012). Mining the semantic web: Statistical learning for next generation knowledge bases. Data Mining and Knowledge Discovery, 24(3), 613–662. https://doi.org/10.1007/s10618-012-0253-2

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 52

68%

Professor / Associate Prof. 12

16%

Researcher 8

11%

Lecturer / Post doc 4

Readers' Discipline

Computer Science 64

84%

Engineering 6

Social Sciences 3

Arts and Humanities 3

Mining the semantic web: Statistical learning for next generation knowledge bases

Abstract

Author supplied keywords

References Powered by Scopus

Learning the parts of objects by non-negative matrix factorization

The semantic web

Factorization meets the neighborhood: A multifaceted collaborative filtering model

Cited by Powered by Scopus

A review of relational machine learning for knowledge graphs

Semantic Web in data mining and knowledge discovery: A comprehensive survey

A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline