Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So it can be very very wrong.

You're trading correctness for speed.



Yes, if you only care about correctness, you always use the maximum possible inference compute. Everything that does not do that is trading correctness for speed.


Yes, the goal here is to avoid overthinking and be as efficient as possible in terms of the minimal tokens required to solve a query. Often, queries that require too many tokens are unlikely to lead to correct answers anyways otherwise they would show up when we are learning the classifier.


If you ask it to rethink the problem again because you've found a flaw, does it bump up the complexity and actually think about it. Like a person might give you a quick answer to something and then questioning the answer would cause them to think deeper about it.


The short answer is in general yes it helps improve the accuracy, there is a whole line of work on self consistency and critique that supports it. Many of those approaches are already implemented in optillm.


If compute is limited, then dedicating more resources to the questions that are more likely to need it will increase correctness overall, even if it may decrease correctness for some individual responses.


> You're trading correctness for speed.

That's AI in a nutshell.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: