Linked Hypernyms Dataset

This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are hypernyms mined from articles' free text using hand-crafted lexicosyntactic patterns.

The dataset contains 4.8 million entity-type assignments.

The dataset was generated with DBpedia 2014 and Wikipedia snapshots in January 2015.

The latest version of the Linked Hypernyms Dataset - late January 2015!

All partitions of the dataset, as described in the dataset description section, can be download from here.

The downloads are provided as N-Triples (zipped), except for the Raw "Plain Text" Dataset. The numbers correspond to instances count (in thousands).
Download highlights
Dataset	Dutch	English	German
Core Dataset Most accurate -- result of pattern matching.	nt 433k	nt 1,595 k	nt 224k
Inference Dataset* Smaller accuracy -- result of machine learning.	nt 560k	nt 1,836 k	nt 607k
Extension Dataset Types are in the DBpedia resource namespace - highest type specificity	nt 1,088k	nt 3,478k	nt 982k
Raw "Plain Text" Dataset All hypernyms are string literals (the original extracted word).	nt 1,601k	nt 3,680k	nt 1,106k

* The inference dataset for 2014 is based on STI algorithm only. The more accurate STI-hSVM result is available for English DBpedia 3.9 here.

Publications

Linked Hypernyms: Enriching DBpedia with Targeted Hypernym Discovery. Web Semantics. In press link
T. Kliegr, O. Zamazal. Towards Linked Hypernyms Dataset 2.0: complementing DBpedia with hypernym discovery. In 9th International Language Resources and Evaluation Conference (LREC'14), Reykjavik, Iceland, May, 2014.
O. Zamazal, T. Kliegr. Type Inference in DBpedia from Free Text. 2015. Under review. resources