It's likely that a significant fraction of the perf difference between Apple' GPUs and NVIDIA GPUs is due to NVIDIA's CUDA being high optimized, and Pytorch being tuned to work with CUDA.
If Pytorch's metal support improves and Apple's Metal drivers improve (big ifs), it's likely that Apple's GPUs will perform better relatively to NVIDIA than they currently do.
img2img runs in 6 seconds on my GeForce 3080 12 GB. 6+ it\s depending on how much GPU memory is available. If I have any electron apps running it slows down dramatically.
4. (most importantly) are you including the 30 seconds or so it takes to load the model initially? i.e. if you were to run 10 prompts and then divide the total time by 10, what are your numbers?
Not too shabby...
EDIT - this comment implies it's much faster: https://news.ycombinator.com/item?id=32679518
If that's correct then it's close to matching my 3080 (mobile).