Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Now I wonder if such a compiler could exist in theory. I think VLIW vs "on-the-fly reordering to extract ILP" is similar to AOT vs JIT, in that there may be runtime exclusive information that's crucial to go the last mile (and x86 might process µops instead). But PGO does exist and could work similarly in both cases to bridge the gap, no?

Note that I vaguely remember having read somewhere that EPIC isn't "true" VLIW (whatever that means), unlike what you can find in some camera SoCs (e.g. Fujitsu FR-V).



> Note that I vaguely remember having read somewhere that EPIC isn't "true" VLIW

Well IIRC, the amount of execution units presented architecturally are not actually reflected to what is available internally. This was done to allow them to increase the number units under the hood without breaking backward compatibility (or is it forward compatibility in this case). At which point, you still are going to need scheduling hardware and all that jazz.

That said, to my limited understanding, all processors are internally VLIW, it's just hidden behind a decoder & scheduler that exposes are more limited ISA so that they don't have to make the trade-off Itanium did.

That said, I really wonder if it's an issue of compiler was too complicated to bootstrap one good enough to get the ecosystem going, or if it was a truly brainded evolutionary fork. Anyone seem any good hand optimised benchmarks to see the potential of the paradigm?

Btw, it looks like someone started adding IA64 support to QEMU: https://github.com/amarioguy/qemu-itanium


> Btw, it looks like someone started adding IA64 support to QEMU: https://github.com/amarioguy/qemu-itanium

It's good to have that available. Students can examine actual software for hardware that'll soon be recycled.


The big issue with VLIW (or VLIW-adjacent) architecture for practical modern use cases is preemptive multitasking or multi tenancy. As soon as any assumptions about cache residency break, the whole thing crumbles to dust. That’s why VLIW is good for DSP, where branches are more predictable but more importantly you know exactly what inputs will be in cache already.


> That’s why VLIW is good for DSP

Maybe GPU usage as well. I know very little about what the ISA of GPUs looks like though.


AMDs GPUs did indeed use a VLIW ISA[1].

They moved to SIMD. Not sure what they're doing in their latest as I haven't been paying attention lately.

[1]: https://www.anandtech.com/show/4455/amds-graphics-core-next-...


Even then, the ISA of GPUs is typically hidden behind the driver which recompiles shaders on-the-fly.


I'm sure it could exist in theory, but VLIW for these large chips has been outcompeted by OTF reordering and SMT, which are capable of extracting almost as much work from the processor as an ideal VLIW instruction flow for a lot less effort.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: