If you use Cython to compile vanilla Python, it implements GC in the C source without the VM. I don't find that especially noteworthy... though I do wonder about the particulars of their GC implementation (though... ick, GC).
VM? if you use your own allocator its pretty straightforward to run C code on a GC. there is of course Boehm, but its .. really slow and pretty fussy. since you own the compiler in this case you can even support object relocation (compaction) which does really* help a lot in total performance as well as footprint.