A Machine Learning Approach to Type Inference in Semantic Knowledge Bases

This page provides downloads of generated datasets containing typed DBpedia entities. These datasets were generated by our hierarchical Support Vector Machines classifier (hSVM) or by the Statistical Type Inference (STI) algorithm, which exploits co-occurrence with types already available in the knowledge base:


The download is provided as N-Triples (gzipped). The numbers correspond to instances count (in thousands).
Dataset Dutch English German
Fusion dataset for DBpedia 3.9

Fusion of Statistical Type Inference (STI) algorithm and hierarchical Support Vector Machines classifier.

nt
1733k
hSVM dataset for DBpedia 2015

Built with only the hSVM algorithm.

nt
1,075k
nt
3,474k
nt
1,042k
Fusion dataset (selective) for DBpedia 2015

Built using the Statistical Type Inference algorithm, the hSVM algorithm was applied where STI was not confident.

nt
1,075k
nt
3,474k
nt
1,042k
Inference dataset for DBpedia 2015

This dataset is a merge of LHD Core and LHD Fusion (selective).

nt
1,078k
nt
3,478k
nt
1,043k