This webpage provides a collection of datasets for benchmarking word similarity and relatedness. The datasets are described in:
Kliegr, Tomáš, and Ondřej Zamazal. Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353. Data & Knowledge Engineering 115 (2018): 174-193.
| Dataset | File |
|---|---|
| WIN-353 WordSim353 word pairs reannotated according to the word interchangeability guidelines. |
zip |
| WordSim353crowd WordSim353 word pairs reannotated according to the original WordSim353 guidelines using crowdsourcing. |
zip |
| ExplictSim353 WordSim353 word pairs reannotated dataset according to explicit similarity guidelines. |
zip |
| Dataset | File |
|---|---|
| WIN-353cs WordSim353 word pairs reannotated according to the word interchangeability guidelines - CZECH version . |
zip |
| WINLex-999cs SimLex-999 word pairs reannotated according to the word interchangeability guidelines - CZECH version . |
zip |
| SimLex-999cs SimLex-999 word pairs - original guidelines - CZECH version. |
zip |
| WordLex-999cs SimLex-999 word pairs - annotated according to the WordSim353 guidelines - CZECH version. |
zip |
| Dataset | File |
|---|---|
| Automatic mappings - WordSim353 | csv |
| Crowdsourced mappings - WordSim353 | csv |
| Automatic mappings - SimLex-666 | csv |
