Is your point really that- "I need to see all data downloaded to make this model, before I can know it is open"? Do you have $XXB worth of GPU time to ingest that data with a state of the art framework to make a model? I don't. Even if I did, I'm not sure FB or Google are in any better position to claim this model is or isn't open beyond the fact that the weights are there.
They're giving you a free model. You can evaluate it. You can sue them. But the weights are there. If you dislike the way they license the weights, because the license isn't open enough, then sure, speak up, but because you can't see all the training data??! Wtf.
To many people there's an important distinction between "open source" and "open weights". I agree with the distinction, open source has a particular meaning which is not really here and misuse is worth calling out in order to prevent erosion of the terminology.
Historically this would be like calling a free but closed-source application "open source" simply because the application is free.
I agree with OP - the weights are more akin to the binary output from a compiler. You can't see how it works, how it was made, you can't freely manipulate with it, improve it, extend it etc. It's like having a binary of a program. The source code for the model was the training data. The compiler is the tooling that can train a module based on a given set of training data. For me it is not critical for an open source model that it is ONLY distributed in source code form. It is fine that you can also download just the weights. But it should be possible to reproduce the weights - either there should be a tar.gz ball with all the training data, or there needs to be a description/scripts of how one could obtain the training data. It must be reproducible for someone willing to invest the time, compute into it even if 99.999% use only the binary. This is completely analogous to what is normally understood by open source.
Do you need to see the source code used to compile this binary before you can know it is open? Do you have enough disk storage and RAM available to compile Chromium on your laptop? I don't.
They're giving you a free model. You can evaluate it. You can sue them. But the weights are there. If you dislike the way they license the weights, because the license isn't open enough, then sure, speak up, but because you can't see all the training data??! Wtf.