With respect to the incompatabilities with PyTorch and TensorFlow - given that the AMD and Intel GPU drivers are more likely to be open sourced - do you believe the open source community or a third party vendors will step in to close the gap for AMD/Intel?
It would seem a great startup idea with the intent to get acqui-hired by AMD or Intel to get into the details of these incompatibilities and/performance differences.
At worst it seems you could pivot into some sort of passive income AI benchmarking website/YT channel similar to the ones that exist for Gaming GPU benchmarks.
Drivers are only the lowest level of the stack. You could (in principle) have a great driver ecosystem and a nonexistent user-level ecosystem. And indeed, the user-level ecosystem on AMD and Intel seems to be suffering.
For example, I recently went looking into Numba for AMD GPUs. The answer was basically, "it doesn't exist". There was a version, it got deprecated (and removed), and the replacement never took off. AMD doesn't appear to be investing in it (as far as anyone can tell from an outsider's perspective). So now I've got a code that won't work on AMD GPUs, even though in principle the abstractions are perfectly suited to this sort of cross-GPU-vendor portability.
NVIDIA is years ahead not just in CUDA, but in terms of all the other libraries built on top. Unless I'm building directly on the lowest levels of abstraction (CUDA/HIP/Kokkos/etc. and BLAS, basically), chances are the things I want will exist for NVIDIA but not for the others. Without a significant and sustained ecosystem push, that's just not going to change quickly.
I think this is what George Hotz is doing with tiny corp, but I have to admit I have little hope. Making asynchronous SIMD code fast is very difficult as a base point, let alone without internal view of decisions like “why does this cause a sync” or even “will this unnecessary copy ever get fixed?”. Unfortunately AMD and especially Intel don’t “develop in the open”, so even if the drivers are open sourced, without context it’ll be an uphill battle.
To give some perspective, see @ngimel’s comments and PRs in Github. That’s what AMD and Intel are competing against, along with confidence that optimizing for ML customers will pay off (clearly NVIDIA can justify the investment already).
This kind of software development is hard and expensive. I do not think that this can enable you to make enough income from benchmark website or YT channel, considering most people are not interested in those low level details.
It would seem a great startup idea with the intent to get acqui-hired by AMD or Intel to get into the details of these incompatibilities and/performance differences.
At worst it seems you could pivot into some sort of passive income AI benchmarking website/YT channel similar to the ones that exist for Gaming GPU benchmarks.