We provide three gold standards for evaluating the accuracy of type assignment in DBpedia. These gold standards were built using a commercial crowdsourcing service Crowdflower. Each entity was annotated by three to four workers. We provide versions for two DBpedia versions: for evaluation we used DBpedia Ontology 2014, for experiments the types were mapped to DBpedia ontology version 3.9.
The annotation guidelines are here
- GS1 gold standard - 1021 typed entities.
- GS2 gold standard - 160 typed entities.
- GS3 gold standard - 1033 typed entities.
Dataset | DBpedia 2014 | DBpedia 3.9 |
---|---|---|
GS1 gold standard | csv | csv |
GS2 gold standard | csv | csv |
GS3 gold standard | csv | csv |
Description
GS1 gold standard
Based on randomly selected entities from 1.7 million of entities in the DBpedia 3.9 Fusion dataset (i.e. from entities for which the LHD Core framework applied to DBpedia 3.9 did not generate any DBpedia Ontology type). GS1 gold standard contains 1021 typed entities with agreement.
GS2 gold standard
This gold standard is based on randomly selected entities from the intersection (56,692 entities) of our LHD Fusion Dataset (for which the intersection represents 3.3% of all 1,733,430 entities) and the Heuristics dataset (for which the intersection represents 9% of all 630,346 entities). GS2 contains 160 typed entities with agreement.
GS3 gold standard
This gold standard is based on randomly selected entities/articles from all Wikipedia articles. GS3 contains 1033 typed entities with agreement.