Alignment is getting overloaded here. In this case, they're primarily referring ...

		datadrivenangel on Dec 19, 2024 \| parent \| context \| favorite \| on: Alignment faking in large language models Alignment is getting overloaded here. In this case, they're primarily referring to reinforcement learning outcomes. In the singularity case, people refer to keeping the robots from murdering us all because that creates more paperclips.