This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are hypernyms mined from articles' free text using hand-crafted lexicosyntactic patterns.
The dataset contains 4.8 million entity-type assignments.
The dataset was generated with DBpedia 2014 and Wikipedia snapshots in January 2015.
The latest version of the Linked Hypernyms Dataset - late January 2015!
All partitions of the dataset, as described in the dataset description section, can be download from here.
Download highlights | |||||
---|---|---|---|---|---|
Dataset | Dutch | English | German | ||
Core Dataset Most accurate -- result of pattern matching. |
nt 433k |
nt 1,595 k |
nt 224k | ||
Inference Dataset* Smaller accuracy -- result of machine learning. |
nt 560k |
nt 1,836 k |
nt 607k |
||
Extension Dataset Types are in the DBpedia resource namespace - highest type specificity |
nt 1,088k |
nt 3,478k |
nt 982k |
||
Raw "Plain Text" Dataset All hypernyms are string literals (the original extracted word). |
nt 1,601k |
nt 3,680k |
nt 1,106k |
Publications
- Linked Hypernyms: Enriching DBpedia with Targeted Hypernym Discovery. Web Semantics. In press link
- T. Kliegr, O. Zamazal. Towards Linked Hypernyms Dataset 2.0: complementing DBpedia with hypernym discovery. In 9th International Language Resources and Evaluation Conference (LREC'14), Reykjavik, Iceland, May, 2014.
- O. Zamazal, T. Kliegr. Type Inference in DBpedia from Free Text. 2015. Under review. resources