Apple doesn't have any hardware SIMD technology that I'm aware of.
At best, Apple has Metal API which iOS video games use. I guess there's a level of SIMD-compute expertise here, but it'd take a lot of investment to turn that into a full scale GPU that tangos with Supercomputers. Software is a bit piece of the puzzle for sure, but Metal isn't ready for prime time.
I'd say Apple is ahead of Intel (Intel keeps wasting their time and collapsing their own progress from Xeon Phi / Battlemage / etc. etc. Intel cannot keep investing in its own stuff to reach critical mass). Intel does have OneAPI but given how many times Intel collapses everything and starts over again, I'm not sure how long OneAPI will last.
But Apple vs AMD? AMD 100% understands SIMD compute and has decades worth of investments in it. The only problem with AMD is that they don't have the raw cash to build out their expertise to cover software, so AMD has to rely upon Microsoft (DirectX), Vulkan, or whatever. ROCm may have its warts, but it does represent over a decade of software development too (especially when we consider that ROCm was "Boltzmann", which had several years of use before it came out as ROCm).
-------
AMD ain't perfect. They had a little diversion with C++Amp with Microsoft (and this served as the API for Boltzmann / early ROCm). But the overall path AMD is making at least makes sense, if a bit suboptimal compared to NVidia's huge efforts into CUDA.
M3 Max's GPU is significantly more efficient in perf/watt than RDNA3, already has better ray tracing performance, and is even faster than a 7900XT desktop GPU in Blender.[0]
Couple of things: Blender uses HIP for AMD which is nerfed in RDNA3 because of product segmentation, so really this is comparing against something which is deliberately mediocre in the 7900 XT.
The M3 Max is also in a sense a generation ahead in terms of perf/watt of the 7900 XT as it uses a newer manufacturing node.
I suppose it's also worth highlighting that if you enable Optix in the comparison above, you can see Nvidia parts stomping all over both AMD and Apple parts alike.
Why does AMD nerf RDNA3 when they're so far behind Nvidia and Apple in Blender performance? Do you have benchmarks for when AMD doesn't nerf Blender performance?
M3 Max GPU uses at most 60-70w. Meanwhile, the 7900XT uses up to 412w in burst mode.[0] TSMC N3 (M3 Max) uses 25-30% less power than TSMC N5 (7900XT). [1] In other words, if 7900XT used N3 and optimizes for the same performance, it would burst to 300w instead which is still 5-6x more than M3 Max. In other words, the perf/watt advantage of the M3 Max is mostly not related to the node used. It's the design.
Its weird that you're choosing a nerf'd part and sticking with it as a comparison point.
The article is MI300X, which is beating NVidia's H100.
> Do you have benchmarks for when AMD doesn't nerf Blender performance?
Go read the article above.
> Notably, our results show that MI300X running MK1 Flywheel outperforms H100 running vLLM for every batch size, with an increase in performance ranging from 1.22x to 2.94x.
-------
> Why does AMD nerf RDNA3 when they're so far behind Nvidia and Apple in Blender performance?
Nerf is a weird word.
AMD has focused on 32-bit FLOPs and 64-bit FLOPs until now. AMD never put much effort into raytracing. They reach acceptable levels on XBox / PS5 but NVidia always was pushing Raytracing (not AMD).
Similarly: Blender is a raytracer that uses those Raytracing cores. So any chip with substantial on-chip ray-tracing / ray-matching / ray-intersection routines will perform faster.
Blender isn't what people do with GPUs. The #1 thing they do is video games like Baldur's gate 3.
-------
It'd be like me asking why Apple's M3 can't run Baldur's gate 3. Its not a "nerf", its a purposeful engineering decision.
At best, Apple has Metal API which iOS video games use. I guess there's a level of SIMD-compute expertise here, but it'd take a lot of investment to turn that into a full scale GPU that tangos with Supercomputers. Software is a bit piece of the puzzle for sure, but Metal isn't ready for prime time.
I'd say Apple is ahead of Intel (Intel keeps wasting their time and collapsing their own progress from Xeon Phi / Battlemage / etc. etc. Intel cannot keep investing in its own stuff to reach critical mass). Intel does have OneAPI but given how many times Intel collapses everything and starts over again, I'm not sure how long OneAPI will last.
But Apple vs AMD? AMD 100% understands SIMD compute and has decades worth of investments in it. The only problem with AMD is that they don't have the raw cash to build out their expertise to cover software, so AMD has to rely upon Microsoft (DirectX), Vulkan, or whatever. ROCm may have its warts, but it does represent over a decade of software development too (especially when we consider that ROCm was "Boltzmann", which had several years of use before it came out as ROCm).
-------
AMD ain't perfect. They had a little diversion with C++Amp with Microsoft (and this served as the API for Boltzmann / early ROCm). But the overall path AMD is making at least makes sense, if a bit suboptimal compared to NVidia's huge efforts into CUDA.
I'd definitely rate AMD's efforts above Apple's Metal.