Using shape expressions (ShEx) to share rdf data models and to guide curation with rigorous validation

Katherine Thornton; Harold Solbrig; Gregory S. Stupp; Jose Emilio Labra Gayo; Daniel Mietchen; Eric Prud’hommeaux; Andra Waagmeester

Conference ProceedingsOPEN ACCESS

Using shape expressions (ShEx) to share rdf data models and to guide curation with rigorous validation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11503 LNCS 606-620

DOI: 10.1007/978-3-030-21348-0_39

37Citations

27Readers

Abstract

We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case for all such subjects within a given graph or subgraph. There are currently five actively maintained ShEx implementations. We discuss how we use the JavaScript, Scala and Python implementations in RDF data validation workflows in distinct, applied contexts. We present examples of how ShEx can be used to model and validate data from two different sources, the domain-specific Fast Healthcare Interoperability Resources (FHIR) and the domain-generic Wikidata knowledge base, which is the linked database built and maintained by the Wikimedia Foundation as a sister project to Wikipedia. Example projects that are using Wikidata as a data curation platform are presented as well, along with ways in which they are using ShEx for modeling and validation. When reusing RDF graphs created by others, it is important to know how the data is represented. Current practices of using human-readable descriptions or ontologies to communicate data structures often lack sufficient precision for data consumers to quickly and easily understand data representation details. We provide concrete examples of how we use ShEx as a constraint and validation language that allows humans and machines to communicate unambiguously about data assets. We use ShEx to exchange and understand data models of different origins, and to express a shared model of a resource’s footprint in a Linked Data source. We also use ShEx to agilely develop data models, test them against sample data, and revise or refine them. The expressivity of ShEx allows us to catch disagreement, inconsistencies, or errors efficiently, both at the time of input, and through batch inspections. ShEx addresses the need of the Semantic Web community to ensure data quality for RDF graphs. It is currently being used in the development of FHIR/RDF. The language is sufficiently expressive to capture constraints in FHIR, and the intuitive syntax helps people to quickly grasp the range of conformant documents. The publication workflow for FHIR tests all of these examples against the ShEx schemas, catching non-conformant data before they reach the public. ShEx is also currently used in Wikidata projects such as Gene Wiki and WikiCite to develop quality-control pipelines to maintain data integrity and incorporate or harmonize differences in data across different parts of the pipelines.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Thornton, K., Solbrig, H., Stupp, G. S., Labra Gayo, J. E., Mietchen, D., Prud’hommeaux, E., & Waagmeester, A. (2019). Using shape expressions (ShEx) to share rdf data models and to guide curation with rigorous validation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11503 LNCS, pp. 606–620). Springer Verlag. https://doi.org/10.1007/978-3-030-21348-0_39

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 10

59%

Professor / Associate Prof. 4

24%

Researcher 2

12%

Lecturer / Post doc 1

Readers' Discipline

Computer Science 13

76%

Arts and Humanities 2

12%

Nursing and Health Professions 1

Social Sciences 1

Article Metrics

Mentions

References: 1

View details >

Using shape expressions (ShEx) to share rdf data models and to guide curation with rigorous validation

Abstract

Author supplied keywords

References Powered by Scopus

Monad transformers and modular interpreters

Wikidata: A new platform for collaborative data collection

Shape expressions: An RDF validation and transformation language

Cited by Powered by Scopus

A study of the quality of Wikidata

Trav-SHACL: Efficiently validating networks of SHACL constraints

Extraction of Validating Shapes from very large Knowledge Graphs

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline

Article Metrics