An RDF Dataset Description Model
for Expressing Vocabulary Usage Patterns

Nandana Mihindukulasooriya, María Poveda-Villalón, Raúl García-Castro, Asunción Gómez-Pérez
Ontology Engineering Group, Universidad Politécnica de Madrid
Acknowledgments:
4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos

Current State

Motivation

Use Case I - Dataset discovery

  • As an RDF data consumer
  • I want to discover datasets that satisfy the requirements of a given task
    • Input: keywords, SPARQL query, Shapes (SHACL, ShEx)
    • Output: relevant datasets
  • so that I can use data from the LOD cloud without spending days of manual inspection.

Use Case I - Example

  • I want to find training data for a ML classifier that categorizes restaurants according their cuisine based on the restaurant's description.
	    PREFIX schema: <http://schema.org/>

select ?restaurant ?description ?cusine  where {
 ?restaurant a schema:Restaurant;
     schema:description ?description;
     schema:servesCuisine ?cusine .
}
          

SPARQL Query

	    @prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/vocab> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:RestaurantShape a sh:Shape ;
	sh:targetClass schema:Restaurant;
	sh:property [
		sh:predicate schema:description;
		sh:minCount 1 ;
	];
	sh:property [
		sh:predicate schema:servesCuisine;
		sh:minCount 1 ;
	] .
          

Shape (SHACL)

Motivation

Use Case II - Vocabulary Usage Reports

  • As an ontology engineer / vocabulary developer
  • I want to know how my vocabulary is used in datasets in practice
  • so that I can understand mismatches or problems and improve it.

Use Case II - Example

  • Given an ontology (e.g., SSN Ontology)
  • Where is it used
    • How many datasets / instances use each class?
    • How many datasets / triples use each property?
  • How is it used
    • What are the other classes used with instances of each class?
    • What are the common subject types of each property?
    • What are the common object types of each property?
    • Do people use my vocabulary terms in a more generic/specific way?
    • Do people replace part of ontology with another one?
    • Do people frequently violate some restrictions?

Motivation - Contd.

  • As an RDF dataset consumer, I want to understand the content and the structure of a dataset so that I can start using it or perform queries without spending hours doing exploratory queries.
  • As an RDF dataset consumer / producer, I want to compare datasets (different datasets or multiple versions of the same dataset) so that I can understand the differences and changes.

Can we extend dataset descriptions by expressing vocabulary usage patterns?

Vocabulary Usage Patterns

  • An implict schema (shapes) for a given dataset
  • Class analysis
    • # of instances, associated properties with their cardinalities, equivalent/super classes
  • Property analysis
    • # of triples, estimated domains / ranges, value ranges/patterns for data type properties
  • Abstract triple pattern analysis
  • Language analysis for strings

Loupe Model

An excerpt from the Loupe model

Implementation - http://loupe.linkeddata.es/

Loupe tool chain

Thank you!

Mail: nmihindu@fi.upm.es

Twitter: @nandanamihindu