You used the word reinforcing, and then asserted there's no reward function. Can... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		aoeusnth1 5 months ago \| parent \| context \| favorite \| on: LLMs are mortally terrified of exceptions You used the word reinforcing, and then asserted there's no reward function. Can you explain how it's possible to perform RL without a reward function, and how the LLM training process maps to that?

MakeAJiraTicket 5 months ago [–]

LLM actions are divorced from that reward function, it's not something they consult or consider. Reward function in that context doesn't make sense.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact