It's both! For this blog we decided to discuss our best end user facing numbers ...

It's both! For this blog we decided to discuss our best end user facing numbers to keep things simple. We briefly hint at our contributor guide here https://github.com/pytorch/ao/issues/391 which does a tour of the APIs we provide developers implementing new algorithms

But we have had quantization algorithm developers such as HQQ or Autoround merge their code in to get composability and serialization for free. We view quantization algorithms as the top layer and going down you have quantized tensors, quant primitives like dequant/quant and finally basic dtypes like uint1-7 and float3-8. Personally why I spent so much time on AO was I was hoping we could make it easier for people to express their quantization algorithms in easy to read PyTorch code and if they must use custom kernels we also have some tutorials for how to integrate custom cuda and triton ops.

Most of those discussions have been happening on #torchao on discord.gg/gpumode so if you need to chat back and forth feel free to reach out to the team there otherwise Github also works.