This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are hypernyms mined from articles' free text using hand-crafted lexicosyntactic patterns.
Datasets were generated within DBpedia 2015 and Wikipedia snapshots in May 2015.
The latest version of the Linked Hypernyms Dataset - late May 2015!
All partitions of the dataset, as described in the dataset description section, can be download from here.
Download highlights | |||||
---|---|---|---|---|---|
Dataset | Dutch | English | German | ||
Inference Dataset Types are in the DBpedia ontology namespace - merge of Core, STI and hSVM datasets |
nt 1,078k |
nt 3,478k |
nt 1,043k |
||
Extension Dataset Types are in the DBpedia resource namespace - highest type specificity |
nt 1,113k |
nt 3,700k |
nt 1,083k |
||
Raw "Plain Text" Dataset All hypernyms are string literals (the original extracted word). |
nt 1,631k |
nt 3,719k |
nt 1,220k |
Download individual parts of the inference dataset | |||||
---|---|---|---|---|---|
Dataset | Dutch | English | German | ||
Core Dataset Most accurate - result of pattern matching |
nt 508k |
nt 2,102k |
nt 376k |
||
STI Dataset Smaller accuracy - result of machine learning |
nt 1,075k |
nt 3,474k |
nt 1,042k |
||
hSVM Dataset Smaller accuracy - result of machine learning (language independent) |
nt 1,075k |
nt 3,474k |
nt 1,042k |
||
Fusion Dataset Improved accuracy - Fusion of STI and hSVM |
nt 1,075k |
nt 3,474k |
nt 1,042k |
Publications
- T. Kliegr. Linked Hypernyms: Enriching DBpedia with Targeted Hypernym Discovery. Web Semantics, Volume 31, March 2015, Pages 59-69 (paper)
- T. Kliegr, O. Zamazal. Towards Linked Hypernyms Dataset 2.0: complementing DBpedia with hypernym discovery. In 9th International Language Resources and Evaluation Conference (LREC'14), Reykjavik, Iceland, May, 2014. (paper)
- O. Zamazal, T. Kliegr. Type Inference in DBpedia from Free Text. Under review. resources