Right, but we do have a reward function that says pleasurable/painful and novel/...

Right, but we do have a reward function that says pleasurable/painful and novel/boring and probably other stuff too. So that can be viewed as a labeling on the data. Earlier data can be associated with the reward labeling through induction; that's how a recurrent neural net works. Doubtless that's an oversimplification though.