This webpage provides a collection of datasets for benchmarking word similarity and relatedness. The datasets are described in:
Kliegr, Tomáš, and Ondřej Zamazal. Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353. Data & Knowledge Engineering 115 (2018): 174-193.
Dataset | File |
---|---|
WIN-353 WordSim353 word pairs reannotated according to the word interchangeability guidelines. |
zip |
WordSim353crowd WordSim353 word pairs reannotated according to the original WordSim353 guidelines using crowdsourcing. |
zip |
ExplictSim353 WordSim353 word pairs reannotated dataset according to explicit similarity guidelines. |
zip |
Dataset | File |
---|---|
WIN-353cs WordSim353 word pairs reannotated according to the word interchangeability guidelines - CZECH version . |
zip |
WINLex-999cs SimLex-999 word pairs reannotated according to the word interchangeability guidelines - CZECH version . |
zip |
SimLex-999cs SimLex-999 word pairs - original guidelines - CZECH version. |
zip |
WordLex-999cs SimLex-999 word pairs - annotated according to the WordSim353 guidelines - CZECH version. |
zip |
Dataset | File |
---|---|
Automatic mappings - WordSim353 | csv |
Crowdsourced mappings - WordSim353 | csv |
Automatic mappings - SimLex-666 | csv |