https://technicalwriting.dev/data/embeddings.html#let-a-thousand-embeddings-bloo...

skybrian · on Nov 1, 2024

It seems like sharing the text itself would be a better API, since it lets API users calculate their own embeddings easily. This is what the crawlers for search engines do. If they use embeddings internally, that’s up to them, and it doesn’t need to be baked into the protocol.

nkko · on Nov 2, 2024

Could we work toward standardization at some point? Obviously, there will always be a newer model. I just hate that all the embedding work I did was with now depreciated openai model. At least single providers should see interest in ensuring that for their own model releases. Some trick like matryoshka embedding could secure that embedding from newer models nest or work within the space of older model preserving some form of comparability or alignment

treefarmer · on Nov 1, 2024

Yeah, this is the main issue with the suggestion. Embeddings can only be compared to each other if they are in the same space (e.g., generated by the same model). Providing embeddings of a specific kind would require users to use the same model, which can quickly become problematic if you're using a closed-source embedding model (like OpenAI's or Cohere's).