Data-Driven RDF Property
Semantic-Equivalence Detection
using NLP Techniques

Mariano Rico, Nandana Mihindukulasooriya, and Asunción Gómez Pérez
Ontology Engineering Group, Universidad Politécnica de Madrid


Acknowledgments:
4V: Volumen, Velocidad, Variedad y Validez en la Gestin Innovadora de Datos

Motivation - DBpedia Use Case

English DBpedia 2016-04

1445 DBpedia ontology properties are used!

and when there is no mapping ...

63891 auto-generated properties (Only English)

a lot more in each language-specific DBpedia datasets

dbo:birthPlace relation

Similar strings - capitalization, typos, prepositions, ...

dbo:birthPlace relation

Synonyms, related words

dbo:birthPlace relation

Other languages

What's their impact?

Let's make a query!

Dear DBpedia, please tell me the landmarks and buildings where Guglielmo Marconi was born.

	    
select ?attraction where {
      dbr:Guglielmo_Marconi  dbo:birthPlace ?place .

      ?attraction a schema:LandmarksOrHistoricalBuildings;
                dbo:location ?place .
    }
            
          

Hypothesis

If we can detect RDF property semantic-equivalences, SPARQL queries can be enhanced to get better results!

How to detect property semantic-equivalences?

  • Structural characteristics

    • Domain and range.
  • Linguistic characteristics

    • String similarity:
      • String distance metrics (e.g Jaro-Winkler distance, Damerau-Levenshtein distance)
      • Token-based techniques (e.g. Jaccard similarity, Cosine Similarity)
    • Semantic Similarity
      • Synonyms (e.g., synsets in WordNet)

Our approach

Enhanced Query

	    
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

select ?s ?bp  where {
 ?s ?p ?bp .
 VALUES ?p {
  #Alternative dbp properties
  dbp:birthPlace
  dbp:birthPlcace dbp:birthplace  birthLocation
  dbp:birhPlace   dbp:bithPlace   cityofbirth
  dbp:birtPlace   dbp:biRthPlace  cityOfBirth 
 }
}
            
          

Evaluation

  • Datasets: English, Spanish, German DBpedia datasets
  • Queries: Query logs from Linked SPARQL Queries Dataset
  • Analysis: Number of results before and after the enhancement
  • Results were improved in most cases upto 300%
  • Impact on quality of the results to be analyzed

Conclusions

  • A lot of redundant properties in DBpedia.
    • Neither Wikipedia (Infobox keys) nor the extraction process (missing mappings) is perfect.
  • Low usage of such properties lead to incomplete answers in SPARQL queries.
  • Results can be improved by enhancing SPARQL queries with semantically-equivalent properties.

Thank you!

Mail: nmihindu@fi.upm.es

Twitter: @nandanamihindu