That's OK. The purpose of the test is to compare approaches to compression. Nois...

That's OK. The purpose of the test is to compare approaches to compression. Noise in the dataset is a handicap but it affects all approaches, so it doesn't invalidate the test as a way to compare approaches. And true white noise is very rare, almost all real noise has characteristics that are at least somewhat compressible. And hopefully whoever constructs the dataset tries not to include tons of noise in it.

Lossless compression is a great objective because it's impossible to cheat. When you do lossy compression you have to define a quality metric and as soon as you do that the game becomes cheating the quality metric rather than actually compressing the data in a useful way.

I highly recommend watching the video I linked. Arithmetic encoding reduces the task of lossless compression to assigning probabilities for the next token, which is the exact task these models are trained to do.