Extracting relations from XML documents

6Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

XML is becoming a prevalent format for data exchange. Many XML documents have complex schemas that are not always known, and can vary widely between information sources and applications. In contrast, database applications rely mainly on the flat relational model. We propose a novel, partially supervised approach for extracting user-defined relations from XML documents with unknown schema. The extracted relations can be directly used by an RDBMS, or utilized for information integration or data mining tasks. Our method attempts to automatically capture the lexical and structural features that indicate the relevant portions of the input document, based on a few user-annotated examples. This information can then be used to extract the relation of interest from documents with schemas potentially different from the training examples. We present preliminary experiments showing that our method could be capable of extracting the target relation from XML documents even in the presence of significant variations in the document schemas. © Springer-Verlag Berlin Heidelberg 2003.

References Powered by Scopus

The Reduced Nearest Neighbor Rule

555Citations
N/AReaders
Get full text

Storing Semistructured Data with STORED

329Citations
N/AReaders
Get full text

The Clio project: Managing heterogeneity

254Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Mapping DTDs to relational schemas with semantic constraints

19Citations
N/AReaders
Get full text

Schema conversion from relation to XML with semantic constraints

4Citations
N/AReaders
Get full text

S2CX: From relational data via SQL/XML to (Un-)Compressed XML

3Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Agichtein, E., Ho, C. T. H., Josifovski, V., & Gerhardt, J. (2003). Extracting relations from XML documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2814, 390–401. https://doi.org/10.1007/978-3-540-39597-3_38

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

50%

Professor / Associate Prof. 1

25%

Researcher 1

25%

Readers' Discipline

Tooltip

Computer Science 4

100%

Save time finding and organizing research with Mendeley

Sign up for free