It's good to see more effort for making things not device specific but I only see benchmarks for NVIDIA B200 and AMD MI350X. Also what's the experience of using one of these Python DSLs like? Are the tools good enough to make code completion, jump to definition, setting breakpoints, watching variables, copying as expression etc. nice?