So it can be very very wrong. You're trading correctness for speed.

baobabKoodaa · 2025-05-28T07:33:04 1748417584

Yes, if you only care about correctness, you always use the maximum possible inference compute. Everything that does not do that is trading correctness for speed.

codelion · 2025-05-28T06:42:00 1748414520

Yes, the goal here is to avoid overthinking and be as efficient as possible in terms of the minimal tokens required to solve a query. Often, queries that require too many tokens are unlikely to lead to correct answers anyways otherwise they would show up when we are learning the classifier.

VagabundoP · 2025-05-28T09:19:24 1748423964

If you ask it to rethink the problem again because you've found a flaw, does it bump up the complexity and actually think about it. Like a person might give you a quick answer to something and then questioning the answer would cause them to think deeper about it.

codelion · 2025-05-28T09:29:58 1748424598

The short answer is in general yes it helps improve the accuracy, there is a whole line of work on self consistency and critique that supports it. Many of those approaches are already implemented in optillm.

wat10000 · 2025-05-28T13:29:12 1748438952

If compute is limited, then dedicating more resources to the questions that are more likely to need it will increase correctness overall, even if it may decrease correctness for some individual responses.

xigency · 2025-05-28T21:18:54 1748467134

> You're trading correctness for speed.

That's AI in a nutshell.