Evaluation Framework for Benchmarking NER Systems

This page provides all information you need to perform an evaluation of a Wikipedia-based NER systems using the framework. We also present a preliminary results of the evaluation of the EntityClassifier.eu NER system (also known as THD).

Step-by-Step evaluation

Bellow we present the preliminary results from the evaluation of the EntityClassifier.eu NER system on the News and Tweets benchmark datasets.

Evaluation results for the EntityClassifier.eu NER system on the Tweets benchmark dataset.
Precision
(strict/lenient)
Recall
(strict/lenient)
F1.0 score
(strict/lenient)
Entity recognition 0.45/0.56 0.67/0.84 0.54/0.67
Entity disambiguation 0.24/0.26 0.36/0.39 0.29/0.31
Entity classification 0.12/0.13 0.17/0.19 0.14/0.15
Evaluation results for the EntityClassifier.eu NER system on the News benchmark dataset
Precision
(strict/lenient)
Recall
(strict/lenient)
F1.0 score
(strict/lenient)
Entity recognition 0.69/0.78 0.33/0.38 0.45/0.51
Entity disambiguation 0.37/0.41 0.18/0.20 0.24/0.27
Entity classification 0.69/0.78 0.33/0.38 0.45/0.51