I have... ~11TBs of free disk space and a 1080ti. Obviously nowhere close to being able to crunch all of Wikimedia Commons, but I'm also not trying to beat Stability AI at their own game. I just want to move the arguments people have about art generators beyond "this is unethical copyright laundering" and "the model is taking reference just like a real human".
To put things in perspective, the dataset it's trained on is ~240TB and Stability has over ~4000 Nvidia A100 (which is much faster than a 1080ti). Without those ingredients, you're highly unlikely to get a model that's worth using (it'll produce mostly useless outputs).
That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".
As pointed out in [1], it seems machine learning takes the same path as physics already did. In the mid-20th century there was a "break" in physics, before individuals were making ground breaking discoveries in their private/personal labs (think Newton, Maxwell, Curie, Roentgen, Planck, Einstein, and many others) later huge collaborations (LHC/CERN, Icecube, EHT, et al.) are required, since the machinery, simulations, models are so complex, that groups of people are needed to create, comprehend and use them.
P.S. To counteract that (unintentionally actually, likely because of a simple optimization of instruments' duty cycle) in astronomy people come up with a concept of "observatory" (Like Hubble, JWST) instead of "experiment" (like LHC, HESS telescopes) where outside people can submit their proposals, and if selected get observational time. Along with raw data authors of the proposals get required expertise from the collaboration to process and analyze that data.
The point is that there's is no practical limit on compression. You don't need "AI" or anything besides very basic statistics to get astronomical compression ratios. (See: "zip bomb".)
The only practical limit is the amount of information entropy in the source material, and if you're going to claim that internet pictures are particularly information-dense I'd need some evidence, because I don't believe you.
Correct, however "compression is equivalent to general intelligence" (http://prize.hutter1.net/hfaq.htm#compai ) and so in a sense, all learning is compression. In this case, SD applies a level of compression that is so high that the only way it can sustain information from its inputs is by capturing their underlying structure. This is a fundamentally deeper level of understanding than image codecs, which merely capture short-range visual features.
Most human behavior is easy to describe with only a few underlying parameters, but there are outlier behaviors where the number of parameters grows unboundedly.
("AI" hasn't even come close to modeling these outliers.)
Internet pictures squarely falls into the "few underlying parameters" bucket.
Because we made the algorithms and can confirm these theories apply to them.
We can speculate they apply to certain models of slices of human behaviour based on our vague understanding of how we work, but not nearly to the same degree.
Hang on, but- plagiarism is a copyright violation, and that passes through the human brain.
When a human looks at a picture and then creates a duplicate, even from memory, we consider that a copyright violation. But when a human looks at a picture and then paints something in the style of that picture, we don't consider that a copyright violation. However we don't know how the brain does it in either case.
How is this different to Stable Diffusion imitating artists?
Well that would be ~4000 people each with an Nvidia A100 equivalent, or more with less, this would be an open effort after all. Something similar to folding@home could be used. Obviously the software for that would need to be written, but I don't think the idea is unlikely. The power of the commons shouldn't be underestimated.
It's not super clear whether the training task can be scaled in a manner similar to protein folding. It's a bit trickier to optimise ML workflows across computation nodes because you need more real time aggregation and decision making (for the algorithms).
A100 costs 10-12k USD 40GB/80GB vram and it's not even targeted at the individual gamer (not effective at gaming) -- they don't even give these things to big YouTube reviewers(LTT). So 4k people will be hard to find. 3090, you can find, that's a 24GB vram card. But that's expensive too and it's a power guzzler compared to the A100 series.
AFAIK. This is not possible at the moment and would need some breakthrough in training algorithms, the required bandwidth between the GPUs is much higher than internet speed.
> That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".
The matter is really very nuanced and trivialising it that way is unhelpful.
If I recompress 240TB as super low quality jpgs and manage to zip them up as single file that is significantly smaller than 240TB (because you can), does the fact they are not pixel perfect matches for the original images mean you’re not violating copyright?
If an AI model can generate statistically significantly similar images from the training data, with a trivial guessable prompt (“a picture by xxx” or whatever) then it’s entirely arguable that the model is similarly infringing.
The exact compression algorithm, be it model or jpg or zip is irrelevant to that point.
It’s entirely reasonable to say, if this is so good at learning, why don’t you train it without the art station dataset.
…because if it’s just learning techniques, generic public domain art should be fine right? Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?
If not, then it’s not just learning technique, it’s copying.
So; tldr: there’s plenty of scope for trying to train a model on an ethically sourced dataset, and investigation of techniques vs copying in generative models.
If I recompress 240TB as super low quality jpgs and manage to zip them up as single file that is significantly smaller than 240TB (because you can), does the fact they are not pixel perfect matches for the original images mean you’re not violating copyright?
If you compress them down to two or three bytes each, which is what the process effectively does, then yes, I would argue that we stand to lose a LOT as a technological society by enforcing existing copyright laws on IP that has undergone such an extreme transformation.
Does that mean it’s worthless to try to train an ethical art model?
Is it not helpful to show that you can train a model that can generate art without training it on copyrighted material?
Maybe it’s good. Maybe not. Who cares if people waste their money doing it? Why do you care?
It certainly feels awfully convenient for that there are no ethically trained models because it means no one can say “you should be using these; you have a choice to do the right thing, if you want to”.
I’m not judging; but what I will say is that there’s only one benefit in trying to avoid and discourage people training ethical models:
…and that is the benefit of people currently making and using unethically trained models.
We don't agree on what "ethical" means here, so I don't see a lot of room for discussion until that happens. Why do you care if people waste computing time programming their hardware to study art and create new art based on what it learns? Who is being harmed? More art in the world is a good thing.
> Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?
You couldn't teach a human to do that without them having seen Greg's art. There are elements of stroke, palette, lightning and composition that can't be fully captured by natural language (short of encoding a ML model, which defeats the point).
Copyrights say you cannot reproduce, distribute, etc a work without consent from the author, whatever the mean. The copy doesn't need to be exact, only sufficiently close.
However, copyright doesn't prevent someone to look at the work and study it. Even study it by heart. Infringement comes only if that someone would make a reproduction of that work. Also, there are provision for fair use, etc.
> …because if it’s just learning techniques, generic public domain art should be fine right? Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?
Is it fair to hold it to a higher standard than humans though? To some degree it's the whole "xxx..... on a computer!" thing all over again if we go that way
> The matter is really very nuanced and trivialising it that way is unhelpful.
Harping about copyrights in the Age of Diffusion Models is unhelpful (for artists) like protesting against a tsunami. It's time to move up the ladder.
ML engineers have a similar predicament - GPT-3 like models can solve at first try, without specialised training, tasks that took a whole team a few years of work. Who dares still use LSTMs now like it's 2017? Moving up the ladder, learning to prompt and fine-tune ready made models is the only solution for ML eng.
The reckoning is coming for programmers and for writers as well. Even scientific papers can be generated by LLMs now - see the Galactica scandal where some detractors said it will empower people to write fake papers. It also has the best ability to generate appropriate citations.
The conclusion is that we need to give up some of the human-only tasks and hop on the new train.
I think it's a great idea regardless of practicality / implementation which I think is generally understood to be largely a matter of time, money and hardware. I feel like you write it up so the idea gets out there or you can pitch it to someone if the opportunity arises.
Oh and also I second the fast.ai suggestion, part 2 is 100% focused on implementing stable diffusion from scratch in the python standard library and it's amazing all around. The course is still actively coming out but the first few lessons are freely available already and the rest sounds like it will be made freely available soon.