I have... ~11TBs of free disk space and a 1080ti. Obviously nowhere close to bei...

ftufek · on Nov 24, 2022

To put things in perspective, the dataset it's trained on is ~240TB and Stability has over ~4000 Nvidia A100 (which is much faster than a 1080ti). Without those ingredients, you're highly unlikely to get a model that's worth using (it'll produce mostly useless outputs).

That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

But if you want to create custom versions of SD, you can always try out dreambooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion, that one is actually feasible without spending millions of dollars on GPUs.

nuccy · on Nov 24, 2022

As pointed out in [1], it seems machine learning takes the same path as physics already did. In the mid-20th century there was a "break" in physics, before individuals were making ground breaking discoveries in their private/personal labs (think Newton, Maxwell, Curie, Roentgen, Planck, Einstein, and many others) later huge collaborations (LHC/CERN, Icecube, EHT, et al.) are required, since the machinery, simulations, models are so complex, that groups of people are needed to create, comprehend and use them.

1. https://www.youtube.com/watch?v=cdiD-9MMpb0 Lex Fridman podcast with Andrej Karpathy

P.S. To counteract that (unintentionally actually, likely because of a simple optimization of instruments' duty cycle) in astronomy people come up with a concept of "observatory" (Like Hubble, JWST) instead of "experiment" (like LHC, HESS telescopes) where outside people can submit their proposals, and if selected get observational time. Along with raw data authors of the proposals get required expertise from the collaboration to process and analyze that data.

otabdeveloper4 · on Nov 24, 2022

> when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

This is just lossy compression with a large and well-tuned (to the expected problem domain) dictionary.

Video compression codecs can achieve a 500x compression ratio, and they are general-purpose.

FeepingCreature · on Nov 24, 2022

The dataset, LAION-5B, is 240TB of already compressed data. (5 billion pairs of text to 512x512 image.)

Uncompressed, LAION-5B would be 4PB, for a compression ratio into SD of ~780kx, or one byte per picture.

otabdeveloper4 · on Nov 24, 2022

The point is that there's is no practical limit on compression. You don't need "AI" or anything besides very basic statistics to get astronomical compression ratios. (See: "zip bomb".)

The only practical limit is the amount of information entropy in the source material, and if you're going to claim that internet pictures are particularly information-dense I'd need some evidence, because I don't believe you.

FeepingCreature · on Nov 24, 2022

Correct, however "compression is equivalent to general intelligence" (http://prize.hutter1.net/hfaq.htm#compai ) and so in a sense, all learning is compression. In this case, SD applies a level of compression that is so high that the only way it can sustain information from its inputs is by capturing their underlying structure. This is a fundamentally deeper level of understanding than image codecs, which merely capture short-range visual features.

otabdeveloper4 · on Nov 24, 2022

I fail to see the difference between "underlying structure" and "short-range visual features".

Both are just simple statistical relationships between parameters and random variables.

FeepingCreature · on Nov 24, 2022

Sure, but why would that not apply to humans? And we don't consider it copyright violation if a human learns painting by looking at art.

otabdeveloper4 · on Nov 24, 2022

Depends on what you mean by "humans".

Most human behavior is easy to describe with only a few underlying parameters, but there are outlier behaviors where the number of parameters grows unboundedly.

("AI" hasn't even come close to modeling these outliers.)

Internet pictures squarely falls into the "few underlying parameters" bucket.

Xelynega · on Nov 24, 2022

Because we made the algorithms and can confirm these theories apply to them.

We can speculate they apply to certain models of slices of human behaviour based on our vague understanding of how we work, but not nearly to the same degree.

FeepingCreature · on Nov 24, 2022

Hang on, but- plagiarism is a copyright violation, and that passes through the human brain.

When a human looks at a picture and then creates a duplicate, even from memory, we consider that a copyright violation. But when a human looks at a picture and then paints something in the style of that picture, we don't consider that a copyright violation. However we don't know how the brain does it in either case.

How is this different to Stable Diffusion imitating artists?

permo-w · on Nov 24, 2022

human memory is lossy compression

wmwragg · on Nov 24, 2022

Well that would be ~4000 people each with an Nvidia A100 equivalent, or more with less, this would be an open effort after all. Something similar to folding@home could be used. Obviously the software for that would need to be written, but I don't think the idea is unlikely. The power of the commons shouldn't be underestimated.

govg · on Nov 24, 2022

It's not super clear whether the training task can be scaled in a manner similar to protein folding. It's a bit trickier to optimise ML workflows across computation nodes because you need more real time aggregation and decision making (for the algorithms).

dmingod666 · on Nov 24, 2022

A100 costs 10-12k USD 40GB/80GB vram and it's not even targeted at the individual gamer (not effective at gaming) -- they don't even give these things to big YouTube reviewers(LTT). So 4k people will be hard to find. 3090, you can find, that's a 24GB vram card. But that's expensive too and it's a power guzzler compared to the A100 series.

macrolime · on Nov 24, 2022

AFAIK. This is not possible at the moment and would need some breakthrough in training algorithms, the required bandwidth between the GPUs is much higher than internet speed.

edude03 · on Nov 24, 2022

Unlike folding@home the problem isn't very distributable because weights needs to be shared between GPUs via very high speed link

wokwokwok · on Nov 24, 2022

Quite right, but…

> That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

The matter is really very nuanced and trivialising it that way is unhelpful.

If I recompress 240TB as super low quality jpgs and manage to zip them up as single file that is significantly smaller than 240TB (because you can), does the fact they are not pixel perfect matches for the original images mean you’re not violating copyright?

If an AI model can generate statistically significantly similar images from the training data, with a trivial guessable prompt (“a picture by xxx” or whatever) then it’s entirely arguable that the model is similarly infringing.

The exact compression algorithm, be it model or jpg or zip is irrelevant to that point.

It’s entirely reasonable to say, if this is so good at learning, why don’t you train it without the art station dataset.

…because if it’s just learning techniques, generic public domain art should be fine right? Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

If not, then it’s not just learning technique, it’s copying.

So; tldr: there’s plenty of scope for trying to train a model on an ethically sourced dataset, and investigation of techniques vs copying in generative models.

It is 100% not something we can just brush off.

CamperBob2 · on Nov 24, 2022

If I recompress 240TB as super low quality jpgs and manage to zip them up as single file that is significantly smaller than 240TB (because you can), does the fact they are not pixel perfect matches for the original images mean you’re not violating copyright?

If you compress them down to two or three bytes each, which is what the process effectively does, then yes, I would argue that we stand to lose a LOT as a technological society by enforcing existing copyright laws on IP that has undergone such an extreme transformation.

wokwokwok · on Nov 24, 2022

Maybe?

Does that mean it’s worthless to try to train an ethical art model?

Is it not helpful to show that you can train a model that can generate art without training it on copyrighted material?

Maybe it’s good. Maybe not. Who cares if people waste their money doing it? Why do you care?

It certainly feels awfully convenient for that there are no ethically trained models because it means no one can say “you should be using these; you have a choice to do the right thing, if you want to”.

I’m not judging; but what I will say is that there’s only one benefit in trying to avoid and discourage people training ethical models:

…and that is the benefit of people currently making and using unethically trained models.

CamperBob2 · on Nov 24, 2022

We don't agree on what "ethical" means here, so I don't see a lot of room for discussion until that happens. Why do you care if people waste computing time programming their hardware to study art and create new art based on what it learns? Who is being harmed? More art in the world is a good thing.

pygy_ · on Nov 24, 2022

> Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

You couldn't teach a human to do that without them having seen Greg's art. There are elements of stroke, palette, lightning and composition that can't be fully captured by natural language (short of encoding a ML model, which defeats the point).

xeyownt · on Nov 24, 2022

Copyrights say you cannot reproduce, distribute, etc a work without consent from the author, whatever the mean. The copy doesn't need to be exact, only sufficiently close.

However, copyright doesn't prevent someone to look at the work and study it. Even study it by heart. Infringement comes only if that someone would make a reproduction of that work. Also, there are provision for fair use, etc.

ENGNR · on Nov 24, 2022

> …because if it’s just learning techniques, generic public domain art should be fine right? Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

Is it fair to hold it to a higher standard than humans though? To some degree it's the whole "xxx..... on a computer!" thing all over again if we go that way

kelseyfrog · on Nov 24, 2022

> Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

Can you please rewrite this in the writing style of Socrates?

visarga · on Nov 24, 2022

> The matter is really very nuanced and trivialising it that way is unhelpful.

Harping about copyrights in the Age of Diffusion Models is unhelpful (for artists) like protesting against a tsunami. It's time to move up the ladder.

ML engineers have a similar predicament - GPT-3 like models can solve at first try, without specialised training, tasks that took a whole team a few years of work. Who dares still use LSTMs now like it's 2017? Moving up the ladder, learning to prompt and fine-tune ready made models is the only solution for ML eng.

The reckoning is coming for programmers and for writers as well. Even scientific papers can be generated by LLMs now - see the Galactica scandal where some detractors said it will empower people to write fake papers. It also has the best ability to generate appropriate citations.

The conclusion is that we need to give up some of the human-only tasks and hop on the new train.

visarga · on Nov 24, 2022

It's "keeping" 1 byte worth of information from each input example. The SD models are 5GB together, and the dataset 2.3B images.

blueboo · on Nov 24, 2022

Stable Diffusion 1 was trained with 256 A100s running for a little over three weeks. These days that would cost less than a Tesla…

pkdpic · on Nov 24, 2022

I think it's a great idea regardless of practicality / implementation which I think is generally understood to be largely a matter of time, money and hardware. I feel like you write it up so the idea gets out there or you can pitch it to someone if the opportunity arises.

Oh and also I second the fast.ai suggestion, part 2 is 100% focused on implementing stable diffusion from scratch in the python standard library and it's amazing all around. The course is still actively coming out but the first few lessons are freely available already and the rest sounds like it will be made freely available soon.