Knowledge Graph知识图谱—9. Data Quality and Linking

9. Data Quality and Linking

9.1 How well are the linked open data in practice?

Linked Open Data Best Practices
Provide Derefencable URIs
Set RDF links pointing at other data sources1

Set RDF links pointing at other data sources2

Use terms from widely deployed vocabularies1

Linked Open Vocabularies(LOV) project
– analyze usage of vocabularies

Make proprietary vocabulary terms dereferencable1

Make proprietary vocabulary terms dereferencable2

Map proprietary vocabulary terms to other vocabularies

Provide provenance metadata

Provide licensing metadata

Provide data-set-level metadata

Refer to additional access methods1

Refer to additional access methods2

More Indicators

9.2 Quality

Linked Data Conformance vs. Quality
Conformance: – i.e., following standards and best practices, technical dimension, can be evaluated automatically

Quality: – i.e., how complete/correct/… is the data, content dimension, hard to evaluate automatically

Quality of Knowledge Graphs

Issues with Automatic Evaluation1

Issues with Automatic Evaluation2

Example: Crowd Evaluation of DBpedia

The Quality of Linked Open Data is far from perfect: conformance & content
Improving the quality is an active field of research
– Survey 2017: >40 approaches
– since then: a lot of work in KG embeddings

9.3 Links

Previously on Knowledge Graphs

Integrate data from different sources
Make connections between entities in those sources
Facilitate cross data source queries
Overcome data silos

Why do we need Links?

How do we Create the Links?

数据太多，很多将自己的跟其他数据集互连

9.3.1 Tool Support

A plethora of names
Mostly used for schema level:

Ontology matching/alignment/mapping
Schema matching/mapping

Mostly used for the instance level:

Instance matching/alignment
Interlinking
Link discovery

9.3.2 Automating Interlinking

Automating Interlinking1

Automating Interlinking2

Summary and Takeaways

Basic Interlinking Techniques

Sources for Interlinking Signals

Sources for Interlinking Signals

Simple String Based Metrics

String equality
e.g. foo:University_of_Mannheim, bar:University_of_Mannheim
Common prefixes
e.g. foo:United_States, bar:United_States_of_America
Common postfixes
e.g. foo:Barack_Obama, bar:Obama
Typical usage of prefixes/postfixes: |common|/max(length)
foo:United_States, bar:United_States_of_America → 12/22
foo:Barack_Obama, bar:Obama → 5/12

Edit Distance