Let's differentiate - how things work now vs how they should work - and also how...

AnthonyMouse · 2025-06-29T08:33:23 1751186003

At this point you're making the case for AI-generated works not being copyrightable rather than for regarding them as derivative works.

martin-t · 2025-06-29T17:11:39 1751217099

They probably should not by copyrightable by the person prompting the model, at least not to the full extend of normal copyright.

But they are still based on the training data. An untrained model is a random noise generator. A model trained exclusively on GPL code will therefore obviously only generate useful code thanks to the GPL input. The output is literally derived from the "training data" input and the prompt.

Now, given the input is a much more substantial than the prompt by orders of magnitude, the prompt is basically irrelevant.

So what the license of the output should be based on is the training data. The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input. It's just manipulation.

AnthonyMouse · 2025-06-29T21:58:33 1751234313

> So what the license of the output should be based on is the training data.

An obvious practical problem with this is that the licenses are variously incompatible with one another:

https://en.wikipedia.org/wiki/License_compatibility

> The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input.

Whether it's an intelligent entity or not doesn't really enter into it. The real question is whether the output is taking enough from some particular input to make it a derivative. Which ought to depend on what a given output actually looks like.