> SLP vectorizer will affect the performance upside of passes before/after it. I...

jcranmer · 2025-06-17T03:47:30 1750132050

Many passes in LLVM tend to assume that the IR is in some sort of poorly-defined canonical form. For example a loop transformation might bother to only handle loops where the latch and exiting block are one and the same, or assume the existence of a dedicated loop preheader. In principle, this dependency on canonical form should only be one where code is less optimal if the code isn't canonical, but compilers are big, complex things, so it should be no surprise that sometimes the noncanonical IR ends up causing crashes or miscompiles.

pizlonator · 2025-06-17T03:58:00 1750132680

Do you remember a case where it did cause a crash or a miscompile, and it wasn't also a case where that same crash/miscompile could have happened under default/optimal phase order?

I can easily imagine that reordering phases increases overall likelihood of surfacing bugs that are there anyway.

I have a harder time imagining a dependence on canonical form that leads to a crash under phase reordering but is totally sound otherwise.

jcranmer · 2025-06-17T04:01:47 1750132907

Offhand, I don't have any cases.

The cases where they're the most likely to occur is in the codegen pipeline, but codegen is already acknowledged to be a situation where you have non-optional passes that have to occur in a particular order, so I suspect you're already discounting those scenarios.

pizlonator · 2025-06-17T03:34:53 1750131293

> it's definitely true for other sequences of passes, where subsequent passes depend on prior passes.

I don’t think it’s true for optimization passes in llvm and that’s what makes llvm kinda great.

I would love to hear of a counter example.

But say that you found a counterexample. That would still not be as bad as what happens in a lot of compilers out there where being able to reorder passes (or even run them a second time) is the exception rather than the norm, and our debate would be about whether there exists even a single pair of passes that could be safely reordered.

almostgotcaught · 2025-06-17T03:42:49 1750131769

Sure - I agree that LLVM is mostly reasonable - I'm just saying I don't think has to do with the design of the IR itself and is primarily because of convention/code quality/stewardship. Like to ask my question more precisely: can you give me an example of an IR design that didn't "isolate ugliness"?

Edit: I'm being rate limited so here's what I wanted to post in response:

You kept talking about LLVM and I just assumed LLVM IR but this thing (different rules before and after regalloc) is exactly the case for LLVM's MIR (even going so far as no longer being SSA after a certain part of the pipeline). And in fact I have a PR hanging out right now for adding a legalization for GISel (which also operates on MIR) that crashes in exactly the way you say it shouldn't - I run my legalizer and it's fine and dandy and then somewhere down the line (another legalizer) there's a null dereference. So I guess LLVM isn't reasonable everywhere! And also I should take a think about the differences between LLVMIR and MIR.

pizlonator · 2025-06-17T03:54:02 1750132442

In my experience with compilers where phase ordering causes fires, it's exactly to do with the IR.

Because I don't want to pick on others (as much fun as that would be), I'll just pick on compilers I wrote that have this property and how much I regret everything I did there.

- The first compiler IR I designed was called Fiji C1, and it had both an SSA form and a not-SSA form, and it had multiple stages of lowering within the same IR. You absolutely could not breathe on the phase order without destroying everything.

- I'm maybe like mostly responsible for JavaScriptCore's DFG IR, which has a LoadStore form, a ThreadedCPS form, and an SSA form, plus maybe other forms too. And there are many phases in there that serve the purpose of carefully lowering some aspect of type speculation. Foolish is the man who tries to mess with the order of those phases.

- The Assembly IR in JavaScriptCore's top tier JIT has different laws before and after regalloc, and before and after stack alloc. That's maybe inevitable and like not too bad since it also has a small number of phases, but still.

Note that in each of these cases, it absolutely is an IR issue. It's not like there were some rules that I failed to follow. It's that I designed the compiler specifically to have an IR that follows different laws in different parts of the pipeline.

o11c · 2025-06-17T04:09:40 1750133380

The "different laws" thing really should be handled by changing the types of all the nodes around the critical passes. Unfortunately, every language I know sucks at typing if you want to avoid all the copying that the naive approach gives you. (remember especially that some types will only exist during certain phases. The most basic of these is: which token types are valid in a parse tree and which aren't? E.g. binary operator tokens usually remain valid, but parentheses disappear, whereas many new types appear. For later phases this only gets more complicated.)

dzaima · 2025-06-17T04:12:15 1750133535

As a fun note, here's a talk of automatically reordering LLVM passes to optimize for code size, which seemingly worked enough to make a talk/paper about: https://www.youtube.com/watch?v=_SqWd74zG2Y