2023-02-09
Datasets for Embeddings Performance Evaluation
Text search
Dataset: BEIR (ArguAna, ClimateFEVER, DBPedia, FEVER, FiQA2018, HotpotQA, NFCorpus, QuoraRetrieval, SciFact, TRECCOVID, Touche2020)
Code search
Dataset: CodeSearchNet
Sentence similarity
Dataset: SentEval (STS 2012–2016)
Text classification
Dataset: SentEval (MR, CR, SUBJ, MPQA, SST, TREC, MRPC)
Source: New and Improved Embedding Model
See also: Vectorview - analyzing data and user queries, providing actionable insights for a better fit between model and user needs