Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doesn’t OpenAI embedding model support 8191/8192 tokens? That aside, declaring a winner by token size is misleading. There are more important factors like cross language support and precision for example


Yep, voyage-3 is not even anywhere in the top of the MTEB leaderboard if you order by `retrieval score` desc.

stella_en_1.5B_v5 seems to be an unsung hero model in that regard

plus you may not even want such large token sizes if you just need accurate retrieval of snippets of text (like 1-2 sentences)


Thanks thund and jdthedisciple for these points and corrections. I'll update the section today.


Updated the section to refer to the "Retrieval Average" column of the MTEB leaderboard. Is that the right column to refer to? Can someone link me to an explanation of how that benchmark works? Couldn't find a good link on it


And that's not all because token encodings of different models can be very different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: