Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Let's differentiate

- how things work now vs how they should work - and also how it works when a human does something vs when a an LLM is used to generate something imitating the human work.

A human has limited time and memory. Human time is valuable, computer time is not. Memorizing something by a human takes time.

When a human is inspired by a work and writes something based on that, he invests a lot of time and energy into it. Therefore people have decided that this creative output should be protected by the law.

Also a human is limited by how much he can remember from the original work. Even if writing what you described, he would inevitably fall back on his own life experiences, opinions, attitude, ways of thinking, etc.

When an LLM is used, it generated a statistical mashup of works it ingested during training. No part of this process has any intrinsic value. It literally only costs what the electricity does. And it's almost infinitely scalable. The law might not call it derivative because it was written at a time where this kind of mechanical derivation was not feasible.



At this point you're making the case for AI-generated works not being copyrightable rather than for regarding them as derivative works.


They probably should not by copyrightable by the person prompting the model, at least not to the full extend of normal copyright.

But they are still based on the training data. An untrained model is a random noise generator. A model trained exclusively on GPL code will therefore obviously only generate useful code thanks to the GPL input. The output is literally derived from the "training data" input and the prompt.

Now, given the input is a much more substantial than the prompt by orders of magnitude, the prompt is basically irrelevant.

So what the license of the output should be based on is the training data. The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input. It's just manipulation.


> So what the license of the output should be based on is the training data.

An obvious practical problem with this is that the licenses are variously incompatible with one another:

https://en.wikipedia.org/wiki/License_compatibility

> The big players can only avoid this logical conclusion by pretending that the model ("AI") is some kind of intelligent entity and also by training on everything so any license is only a minority of the input.

Whether it's an intelligent entity or not doesn't really enter into it. The real question is whether the output is taking enough from some particular input to make it a derivative. Which ought to depend on what a given output actually looks like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: