Linked Hypernyms Dataset

v2016-04

Gold standards for Evaluation

Overall approach (JWS 16 paper - text mining)

Table below gives an overview of the gold standard evaluation datasets.
The downloads are provided in the n triples (nt) format.
Gold standard
GS 1 - random selection from articles typed by STI on DBpedia 3.9 (maped to DBpedia 2014 ontology types)
GS 2 - for evaluation of SDType on DBpedia 3.9
GS 3 - randomly selected from Wikipedia (maped to DBpedia 2014 ontology types)
GS 3h - randomly selected from Wikipedia with no type in DBpedia 2014 (mapped to DBpedia 2014 ontology types)
We also provide a python script that computes the hierarchical measures used in the paper.

The results and details about the experimental setup are covered in our LHD 2.0 paper.

Lexico-syntactic patterns (JWS 15 paper - hypernym discovery)

We performed evaluation on three natural languages: English, German and Dutch (corresponding gold standards are listed in table below).

The downloads are provided as csv files.
Download highlights
Gold standard nl en de

For English the annotation was performed on 1165 randomly drawn articles from English Wikipedia. In total there were 1033 entities with assigned agreement category in the gold standard for English. Additionally, 22 entities were assigned to the 'not found' category, 47 entities the 'disambiguation page' category, and in 63 cases there was no agreement.

For German the annotation was performed on 300 randomly drawn articles from German Wikipedia. In total there were 248 entities with assigned agreement category in the gold standard for German. Additionally, 15 entities were assigned to the 'not found' category, 19 entities the 'disambiguation page' category, and in 18 cases there was no agreement.

For Dutch the annotation was performed on 239 randomly drawn articles from Dutch Wikipedia. In total there were 222 entities with assigned agreement category in the gold standard for Dutch. Additionally, 8 entities were assigned to the 'not found' category, 6 entities the 'disambiguation page' category, and in 3 cases there was no agreement. For the evaluation we used the most up-to-date version of the DBpedia Ontology (2014).

The results are described in detail in the JWS 15 LHD paper.

Publications