Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there any benefit to fine-tuning a model on your corpus before using it to generate embeddings? Would that improve the quality of the matches?


Yes. Especially if you work in a not well supported language and/or have specific datapairs you want to match that might be out of ordinary text.

Training your own fine tune takes a really short time and GPU resources, and you can easily outperform even sota models on your specific problem with a smaller model/vector space

Then again on general English text and doing a basic fuzzy search. I would not really expect high performance gains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: