I was playing with a self-hosted model a while back and instructed it to only give answers that were unhelpful, vague, and borderline rude.
It worked surprisingly well a lot of the time! But most of the time it also kinda broke the model in terms of coherent answers because it was obviously trained for the exact opposite thing.
It worked surprisingly well a lot of the time! But most of the time it also kinda broke the model in terms of coherent answers because it was obviously trained for the exact opposite thing.