2023-02-09

Datasets for Embeddings Performance Evaluation

Text search

Dataset: BEIR (ArguAna, ClimateFEVER, DBPedia, FEVER, FiQA2018, HotpotQA, NFCorpus, QuoraRetrieval, SciFact, TRECCOVID, Touche2020)

Code search

Dataset: CodeSearchNet

Sentence similarity

Dataset: SentEval (STS 2012–2016)

Text classification

Dataset: SentEval (MR, CR, SUBJ, MPQA, SST, TREC, MRPC)

Source: New and Improved Embedding Model

