language models require massive scale to train. But scale isn't only in the numb...

Taek · on March 16, 2023

"Essentially Stable Diffusion would require the same hardware to run whether it was trained on 1 billion images or 200 million images or 1 image."

Same hardware maybe but you need more compute as the image count goes up

visarga · on March 16, 2023

The parent poster was talking about training longer but the model being kept at smaller scale so it would not be expensive to use in production. It's a trade-off, you could train shorter with a larger model.