Doesn’t OpenAI embedding model support 8191/8192 tokens? That aside, declaring a...

jdthedisciple · on Nov 1, 2024

Yep, voyage-3 is not even anywhere in the top of the MTEB leaderboard if you order by `retrieval score` desc.

stella_en_1.5B_v5 seems to be an unsung hero model in that regard

plus you may not even want such large token sizes if you just need accurate retrieval of snippets of text (like 1-2 sentences)

kaycebasques · on Nov 1, 2024

Thanks thund and jdthedisciple for these points and corrections. I'll update the section today.

kaycebasques · on Nov 1, 2024

Updated the section to refer to the "Retrieval Average" column of the MTEB leaderboard. Is that the right column to refer to? Can someone link me to an explanation of how that benchmark works? Couldn't find a good link on it

OutOfHere · on Nov 1, 2024

And that's not all because token encodings of different models can be very different.