Linked Hypernyms Dataset

v2014

This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are hypernyms mined from articles' free text using hand-crafted lexicosyntactic patterns.

The dataset contains 4.8 million entity-type assignments.

The dataset was generated with DBpedia 2014 and Wikipedia snapshots in January 2015.


The latest version of the Linked Hypernyms Dataset - late January 2015!

All partitions of the dataset, as described in the dataset description section, can be download from here.


The downloads are provided as N-Triples (zipped), except for the Raw "Plain Text" Dataset. The numbers correspond to instances count (in thousands).
Download highlights
Dataset Dutch English German
Core Dataset
Most accurate -- result of pattern matching.
nt
433k
nt
1,595 k
nt
224k
Inference Dataset*
Smaller accuracy -- result of machine learning.
nt
560k
nt
1,836 k
nt
607k
Extension Dataset
Types are in the DBpedia resource namespace - highest type specificity
nt
1,088k
nt
3,478k
nt
982k
Raw "Plain Text" Dataset
All hypernyms are string literals (the original extracted word).
nt
1,601k
nt
3,680k
nt
1,106k
* The inference dataset for 2014 is based on STI algorithm only. The more accurate STI-hSVM result is available for English DBpedia 3.9 here.

Publications