Linked Hypernyms Dataset

v2015

This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are hypernyms mined from articles' free text using hand-crafted lexicosyntactic patterns.

Datasets were generated within DBpedia 2015 and Wikipedia snapshots in May 2015.


The latest version of the Linked Hypernyms Dataset - late May 2015!

All partitions of the dataset, as described in the dataset description section, can be download from here.


The downloads are provided as N-Triples (gzipped). The numbers correspond to instances count (in thousands).
Download highlights
Dataset Dutch English German
Inference Dataset
Types are in the DBpedia ontology namespace - merge of Core, STI and hSVM datasets
nt
1,078k
nt
3,478k
nt
1,043k
Extension Dataset
Types are in the DBpedia resource namespace - highest type specificity
nt
1,113k
nt
3,700k
nt
1,083k
Raw "Plain Text" Dataset
All hypernyms are string literals (the original extracted word).
nt
1,631k
nt
3,719k
nt
1,220k
The downloads are provided as N-Triples (gzipped). The numbers correspond to instances count (in thousands).
Download individual parts of the inference dataset
Dataset Dutch English German
Core Dataset
Most accurate - result of pattern matching
nt
508k
nt
2,102k
nt
376k
STI Dataset
Smaller accuracy - result of machine learning
nt
1,075k
nt
3,474k
nt
1,042k
hSVM Dataset
Smaller accuracy - result of machine learning (language independent)
nt
1,075k
nt
3,474k
nt
1,042k
Fusion Dataset
Improved accuracy - Fusion of STI and hSVM
nt
1,075k
nt
3,474k
nt
1,042k

Publications