Evaluation Framework for Benchmarking NER Systems

This page provides downloads of the two benchmark datasets.

Both datasets were reannotated to fit the needs of evaluation of systems performing Wikipedia-based entity classification and Wikipedia-based entity linking: Entities recognized in the original datasets were enriched with a link to Wikipedia and the most specific type from the DBpedia Ontology. The annotations were created by two annotators and a judge.

Size metrics for the Tweets and News datasets
Num of documents Total number of entities Entities with a CONLL type Entities with Dbpedia Ontology type Entities with a Wikipedia URL
News 10 588 580 367 440
Tweets 1044 1523 1523 1379 1354

Description of fields

Common fields

News dataset specific fields

Tweet dataset specific fields