Measuring semantic similarity by Word INterchangeability - Datasets

This webpage provides a collection of datasets for benchmarking word similarity and relatedness. The datasets are described in:
Kliegr, Tomáš, and Ondřej Zamazal. Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353. Data & Knowledge Engineering 115 (2018): 174-193.


The downloads are provided as zipped csv files with guidelines included in the zip file.
Dataset File
WIN-353
WordSim353 word pairs reannotated according to the word interchangeability guidelines.
zip
WordSim353crowd
WordSim353 word pairs reannotated according to the original WordSim353 guidelines using crowdsourcing.
zip
ExplictSim353
WordSim353 word pairs reannotated dataset according to explicit similarity guidelines.
zip
The downloads are provided as zipped csv files.
Dataset File
WIN-353cs
WordSim353 word pairs reannotated according to the word interchangeability guidelines - CZECH version .
zip
WINLex-999cs
SimLex-999 word pairs reannotated according to the word interchangeability guidelines - CZECH version .
zip
SimLex-999cs
SimLex-999 word pairs - original guidelines - CZECH version.
zip
WordLex-999cs
SimLex-999 word pairs - annotated according to the WordSim353 guidelines - CZECH version.
zip
We also provide mappings for words in WordSim353 and SimLex-666 (subset of Simlex-999) to Wikipedia articles (DBpedia resources).
The downloads are provided as csv or zipped csv files.
Dataset File
Automatic mappings - WordSim353 csv
Crowdsourced mappings - WordSim353 csv
Automatic mappings - SimLex-666 csv