I'm a senior software dev that is still using the last of the intel macs (granted a maxed out 16" not an air).
These kind of comparisons are still valid for me. There are plenty of others less technical than me that want these too. The youngest intel Airs only just aged out of applecare coverage last year, and for most casual users getting 4 years out of an Apple computer is totally expected.
My personal laptop is 7 years old so these comparisons are relevant to me, too. I’ve operated on my Air a bit to reduce thermal throttling and I don’t use it for anything crazy so it’s still useful, but one of these days I’ll upgrade. I’m sure there are plenty of people like us with these old beasts.
Apple could dramatically improve performance if they just tasked one Metal engineer on llama.cpp. Like, just to finish up flash attention and quantum KV cache, and optimize the Metal kernels.
I wouldn't be surprised if they could double performance.
I know Apple is pushing MLX, and MLC-LLM is fast too, but in practice most Mac users (I think) are using llama.cpp based stacks.
What is the appeal of running this sort of stuff locally though? Its still slower and less memory than a cluster or just a strong server. Just ssh into some horsepower and keep your lap cold.
I like M1 Pro so far with models up to 30-70b parameters, but the memory bandwidth is my current limit.
With a large jump in unified memory and bandwidth we could see 120b parameter models running on a laptop.
As a side note, why does Apple continue to reference the Intel MacBook Air... It's over 6 years old now, no shit this new CPU is 16x faster...