More

contificate · 2026-01-03T18:35:21 1767465321

> Meanwhile, a compiler is an enormously complicated story.

I don't intend to downplay the effort involved in creating a large project, but it's evident to me that there's a class of "better C" languages for which LLVM is very well suited.

On purely recreational grounds, one can get something small off the ground in an afternoon with LLVM. It's very enjoyable and has a low barrier to entry, really.

norir · 2026-01-03T19:29:03 1767468543

Yes, this is fine for basic exploration but, in the long run, I think LLVM taketh at least as much as it giveth. The proliferation of LLVM has created the perception that writing machine code is an extremely difficult endeavor that should not be pursued by mere mortals. In truth, you can get going writing x86_64 assembly in a day. With a few weeks of effort, it is possible to emit all of the basic x86_64 instructions. I have heard aarch64 is even easier but I only have experience with x86_64.

What you then realize is that it is possible to generate quality machine code much faster than LLVM and using far fewer resources. I believe both that LLVM has been holding back compiler evolution and that it is close to if not already at peak popularity. As LLMs improve, the need for tighter feedback loops will necessitate moving off the bloat of LLVM. Moreover, for all of the magic of LLVMs optimization passes, it does very little to prevent the user from writing incorrect code. I believe we will demand more from a compiler backend than LLVM can ever deliver.

The main selling point of LLVM is that you gain access to all of the targets, but this is for me a weak point in its favor. Firstly, one can write a quality self hosting compiler with O(20) instructions. Adding new backends should be trivial. Moreover, the more you are thinking about cross platform portability, the more you are worrying about hypothetical problems as well as the problems of people other than yourself. Get your compiler working well first on your machine and then worry about other machines.

contificate · 2026-01-03T21:32:26 1767475946

I agree. I've found that, for the languages I'm interesting in compiling (strict functional languages), a custom backend is desirable simply because LLVM isn't well suited for various things you might like to do when compiling functional programming languages (particularly related to custom register conventions, split stacks, etc.).

I'm particularly fond of the organisation of the OCaml compiler: it doesn't really follow a classical separation of concerns, but emits good quality code. E.g. its instruction selection is just pattern matching expressed in the language, various liveness properties of the target instructions are expressed for the virtual IR (as they know which one-to-one instruction mapping they'll use later - as opposed to doing register allocation strictly after instruction selection), garbage collection checks are threaded in after-the-fact (calls to caml_call_gc), its register allocator is a simple variant of Chow et al's priority graph colouring (expressed rather tersely; ~223 lines, ignoring the related infrastructure for spilling, restoring, etc.)

--

As a huge aside, I believe the hobby compiler space could benefit from someone implementing a syntactic subset of LLVM, capable of compiling real programs. You'd get test suites for free and the option to switch to stock LLVM if desired. Projects like Hare are probably a good fit for such an idea: you could switch out the backend for stock LLVM if you want.

maybewhenthesun · 2026-01-03T21:58:21 1767477501

>Adding new backends should be trivial.

Sounds like famous last words :-P

And I don't really know about faster once you start to handle all the edge cases that invariably crop up.

Point in case: gcc

fc417fc802 · 2026-01-04T03:40:27 1767498027

It's the classic pattern where you redefine the task as only 80% of the original.

cardanome · 2026-01-03T20:00:11 1767470411

That is why the Hare languages uses QBE instead: https://c9x.me/compile/

Sure it can't do all the optimizations LLVM can but it is radically simpler and easier to use.

christophilus · 2026-01-03T21:03:31 1767474211

Hare is a very pleasant language to use, and I like the way the code looks vs something like Zig. I also like that it uses QBE for the reasons they explained.

That said, I suspect it’ll never be more than a small niche if it doesn’t target Mac and Windows.

sixthDot · 2026-01-04T04:39:53 1767501593

If only that was only about emitting byte code in a file then calling the linker... you also have the problem of debug information, optimizers passes, the amount of tests required to prove the output byte code is valid, etc.

fuzztester · 2026-01-03T19:16:05 1767467765

>On purely recreational grounds, one can get something small off the ground in an afternoon with LLVM. It's very enjoyable and has a low barrier to entry, really.

Is there something analogous for those wanting to create language interpreters, not compilers? And preferably for interpreters one wants to develop in Python?

Doesn't have to literally just an afternoon, it could be even a few weeks, but something that will ease the task for PL newbies? The tasks of lexing and parsing, I mean.

fredrikholm · 2026-01-03T20:03:49 1767470629

https://craftinginterpreters.com/introduction.html

AST interpreter in Java from scratch, followed by the same language in a tight bytecode VM in C.

Great book; very good introduction to the subject.

contificate · 2026-01-03T21:44:30 1767476670

There's quite neat lexer and parser generators for Python that can ease the barrier to entry. For example, I've used PLY now and then for very small things.

On the non-generated side, lexer creation is largely mechanical - even if you write it by hand. For example, if you vaguely understand the idea of expressing a disjunctive regular expression as a state machine (its DFA), you can plug that into skeleton algorithms and get a lexer out (for example, the algorithm shown in Reps' "“Maximal-Munch” Tokenization in Linear Time " paper). For parsing, taking a day or two to really understand Pratt parsing is incredibly valuable. Then, recursive descent is fairly intuitive to learn and implement, and Pratt parsing is a nice way to structure your parser for the more expressive parts of your language's grammar.

Nowadays, Python has a match (pattern matching) construct - even if its semantics are somewhat questionable (and potentially error-prone). Overall, though, I don't find Python too unenjoyable for compiler-related programming: dataclasses (and match) have really improved the situation.

gritzko · 2026-01-03T20:11:53 1767471113

I am a big fan of Ragel[1]. That is a high performance parser generator. In fact, it can generate different types of parsers, very powerful. Unfortunately, it takes a lot of skill to operate. I wrote a parser generator generator to make it all smooth[2], but after 8 years I still can't call it effortless. A colleague of mine once "broke the internet" with a Ragel bug. So, think twice. Still, for weekend activities I highly recommend it, just for the way of thinking it embodies.

[1]: https://www.colm.net/open-source/ragel/

[2]: https://github.com/gritzko/librdx/blob/master/rdx/JDR.lex

fuzztester · 2026-01-04T00:18:39 1767485919

Is this the same Ragel that Zed Shaw wrote about in one of his posts back in the day, during Ruby and Rails heydays? I vaguely remwmber that article. I think he used it for Mongrel, his web server.

https://github.com/mongrel/mongrel

sph · 2026-01-04T13:03:56 1767531836

The worst part of designing a language is the parsing stage.

Simple enough to do it by hand, but there’s a lot of boilerplate and bureaucracy involved that is painfully time-wasting unless you know exactly what syntax you are going for.

But if you adopt a parser-generator such as Flex/Bison you’ll find yourself learning and debugging and obtuse language that has to be forcefully bent to your needs, and I hope your knowledge of parsing theory is up-to-scratch when you’re facing with shift-reduce conflicts or have to decide whether LR or LALR(1) or whatever is most appropriate to your syntax.

Not even PEG is gonna come to your rescue.

fuzztester · 2026-01-09T05:19:39 1767935979

how is the boilerplate etc related to the syntax. not clear. i would have thought you first decide the syntax n only then start the work.

but i've never created an interpreter, let alone a compiler.

fuzztester · 2026-01-03T21:07:11 1767474431

Thanks to those who replied.

contificate · 2025-10-09T13:09:16 1760015356

There's a neat paper where they implement basic blocks (in a control flow graph) as zippers (https://www.cs.tufts.edu/~nr/pubs/zipcfg.pdf). The neat part is that - due to how the host language works (mutation having the cost of invoking the write barrier) - their measurements show that the zipper version is more performant than the mutable version.

contificate · 2025-03-12T10:50:08 1741776608

I sometimes write C recreationally. The real problem I have with it is that it's overly laborious for the boring parts (e.g. spelling out inductive datatypes). If you imagine that a large amount of writing a compiler (or similar) in C amounts to juggling tagged unions (allocating, pattern matching over, etc.), it's very tiring to write the same boilerplate again and again. I've considered writing a generator to alleviate much of the tedium, but haven't bothered to do it yet. I've also considered developing C projects by appealing to an embeddable language for prototyping (like Python, Lua, Scheme, etc.), and then committing the implementation to C after I'm content with it (otherwise, the burden of implementation is simply too high).

It's difficult because I do believe there's an aesthetic appeal in doing certain one-off projects in C: compiled size, speed of compilation, the sense of accomplishment, etc. but a lot of it is just tedious grunt work.

anymouse123456 · 2025-03-12T20:07:23 1741810043

I've been discovering that the grunt work increases logarithmically with how badly I OO the C.

When I simplify and think in terms of streams, it starts getting nice and tidy.

contificate · on Feb 21, 2025

On the topic of QBE, I've always felt that someone aiming to do the same scope of project ought to make their IR a syntactic subset of LLVM IR. If you do that, your test suite can involve invoking LLVM for a comparison.

As for QBE itself, many of the core transformations are fairly standard, which makes it somewhat more approachable for me (e.g. Cooper et al's dominance algorithm, Rideau et al's parallel moves algorithm, etc.). Of course, this doesn't negate the fact that it's not as "hackable" as they probably intended.

contificate · on Feb 21, 2025

Typical implementations of Lengauer-Tarjan are often taken verbatim from Andrew Appel's book and involve higher constant factors than alternative algorithms - such as Cooper et al's "engineered" version of the usual fixpoint algorithm, which is very neat, simple, and performs well.

If you are saying that constructing SSA following the classical approach is a premature optimisation, perhaps you would prefer Braun et al's retelling of Cliff Click's SSA construction algorithm - which works backwards from uses to place phis and requires no dominance information to be computed.

haqreu · on Feb 21, 2025

Thank you for the reference, I'll check it.

Why not fully maximal SSA + DCE? I mean, other than timings consideration. Fully maximal does not even need any graph traversal, it is entirely local to each block.

contificate · on Feb 21, 2025

I think it will introduce too many redundant phis, but I've never used it in practice - so I can only speculate. I'm not convinced DCE will clean maximal SSA up substantially. Even the classic Cytron et al algorithm must be combined with liveness analysis to avoid placing dead phis (that is, it does not produce pruned SSA by default). In the past, there was always a fear that SSA has the potential to cause quadratic blowup in the number of variables - this concern is mostly theoretical but probably influenced some of the design decision around algorithms for constructing SSA.

Braun et al's algorithm works backwards from uses (which generate liveness), so you get pruned SSA out. In the case of reducible control flow graphs, you also get minimal SSA. This is all without any liveness or dominance computation beforehand. Granted, you may want those things later, but it's nice that you can construct a decent quality SSA with a fairly intuitive algorithm. Also shown in the paper is that you can incorporate a few light optimisations during SSA construction (constant folding, for example).

haqreu · on Feb 21, 2025

I'll definitely test maximal SSA + DCE in tinyoptimizer when I'll get to it. For the moment I made [1] somewhat-pruned mem2reg pass, not even sure how to call it. But it indeed required to compute dominance frontiers.

[1] https://ssloy.github.io/tinyoptimizer/mem2reg/

contificate · on Feb 21, 2025

Nice.

I'm always happy to see more accessible resources for compiler writers.

---

As an aside: for displaying CFGs on the page, it would be very interesting to emit something somewhat dynamic. SVGs are always a good start, but there is a neat library for doing hierarchical graph layout (dagre, with d3-dagre handling rendering as well). In my own efforts at pedagogy, I've been interested in producing CFGs in-browser whose basic blocks comprise a "unified diff" view of the block (this being achieved in realtime by maintaining a subset of LLVM whose CFGs are persistent). Then it is more obvious what has changed: at least in the case of mem2reg which shouldn't introduce new blocks or move too much around (I forget if it hoists allocas to the entry block or not).

It'd also be cool to distil what underlying ideas you have found to be most useful in your efforts. The constrained scope of them may be useful to me, as I've wanted to create a kind of "advent of compilers" for years (advent of code but with a heavy slant towards compiler tasks/algorithms).

haqreu · on Feb 21, 2025

Thank you!

contificate · on Sept 12, 2024

I have a rather niche theory that many Hindley-Milner type inference tutorials written by Haskellers insist on teaching the error-prone, slow, details of algorithm W because otherwise the authors would need to commit to a way to do destructive unification (as implied by algorithm J) that doesn't attract pedantic criticism from other Haskellers.

For me, I stopped trying to learn Haskell because I couldn't quite make the jump from writing trivial (but neat) little self-contained programs to writing larger, more involved, programs. You seem to need to buy into a contorted way of mentally modelling the problem domain that doesn't quite pay off in the ways advertised to you by Haskell's proponents (as arguments against contrary approaches tend to be hyperbolic). I'm all for persistent data structures, avoiding global state, monadic style, etc. but I find that OCaml is a simpler, pragmatic, vehicle for these ideas without being forced to bend over backwards at every hurdle for limited benefit.

contificate · on July 27, 2024

The author has mentioned ANF a few times but, from what I can tell, the likeness that they emphasise is really just the usual property of operands being atomic. This is a property used in many IRs, but I don't feel it's enough to describe Bril as being "an ANF language" - especially when you think about how tied ANF is to the functional compiler space.

The original ANF is actually looser than this in that it permits anonymous functions as arguments. In practice, there is no canonical variant of ANF that people really refer to, but most people usually mean a variant of ANF that doesn't permit this (which, to my knowledge, was first published in David Tarditi's PhD thesis). See this table from Appel's "Modern Compiler Implementation in ML" for the comparisons made in the functional IR space: https://i.imgur.com/17nfGMI.png.

Usually what people in the functional compiler space mean when they mention ANF is some variant of ANF (with Tarditi's restriction) that retains nested functions in a tree-like structure. The tree structure is important because it practically necessitates the extension of "join point"s within the tree structure (to act as local, second-class, continuations: to avoid duplicating evaluation contexts for multi-way branching constructs, without using functions for this purpose). It just so happens that you hoist ANF into a bunch of function bodies (which were once nested) and blocks (which were once join points), you can easily construct a control flow graph. However, you could also say that lambda calculus would be "in SSA" throughout all of this (as it is originally, then normalised into ANF, and then hoisted into a control flow graph) - it just isn't usually what people mean when they talk about an SSA IR (they tend to imply the existence of specific provisions for the splitting of live ranges at join points, be it a phi-like pseudo-instruction or block arguments).

All this is to say that ANF is very tied to literature about the compilation of functional languages and its related analysis and comparison with CPS (such as whether it's closed under beta-reduction, for example), such that I think we need to be a bit more precise about the expected shape and properties of IRs to differentiate them, rather than just expecting compiler engineers to know what you're talking about - and, indeed, agree with your description - when you describe something as "an ANF language".

michaelmior · on July 27, 2024

The author does acknowledge in the article that Bril is stricter than ANF.

contificate · on July 27, 2024

My reading of the article is that the author has chosen to use "ANF" to describe a specific property of their IR that is not unique to ANF, whilst ignoring the fact that ANF (and variants of it) is strongly tied to the functional compiler realm where a specific tree-shaped structure - with nested and first-class functions etc. - is expected. The article says "It's an instruction-based, assembly-like, typed, ANF language". I think the usage of "ANF" is a misnomer here: just because it has this atomic arguments property does not make it an "ANF language" (whatever that means).

I normally wouldn't leave such a long - pedantic - comment, but the first comment on this thread was a question about ANF; I don't think much of the article has any relevance to ANF. It's mentioned off-hand and used as an adjective ("Bril is extremely A-normal form"), which suggests we need better terminology here. Most practical CPS IRs share the same atomic argument property, but you wouldn't suggest "Bril is extremely CPS".

contificate · on July 1, 2024

It is worth noting that lots of applications of unification do not reify explicit substitutions in their implementations. You often see introductory type inference articles (usually focused on Hindley-Milner) use algorithm W, which uses a unification procedure that returns substitutions (which must then be composed with other substitutions). All of this is less efficient and arguably more error-prone and more complicated than the destructive unification implied by algorithm J (using the union-find data structure, often implemented invasively into the type representation to imply a union-find forest).

To this end, good coverage of decent (destructive) unification algorithms can be found in any simple resource on type inference (https://okmij.org/ftp/ML/generalization/sound_eager.ml) or the Warren Abstract Machine (https://github.com/a-yiorgos/wambook/blob/master/wambook.pdf).

Of course, there are times where you would want to reify substitutions as a data structure to compose and apply, but most of the time you just want to immediately apply them in a pervasive way.

Despite what another comment says, unification is a valid - and rather convenient - way to implement pattern matching (as in the style of ML) in an interpreter: much like how you rewrite type variables with types you are unifying them with, you can rewrite the pattern variables to refer to the parts of the (evaluated) value you are matching against (which you then use to extend the environment with when evaluating the right hand side).

ReleaseCandidat · on July 1, 2024

> Despite what another comment says, unification is a valid - and rather convenient - way to implement pattern matching

The article shows that it is a valid and rather convenient way. That's why it is important to tell people that it is not necessary to use it for the "usual" pattern matching. You do (again: for the "usual" pattern matching without "variables", as in free variables, on the right side) _not_ want to use unification but "compile" the pattern matching.

contificate · on July 1, 2024

If we are talking about the context of a compiler, I agree: you should compile pattern matching to decision trees (DAGs), by way of something like Pettersson's algorithm. In my comment, I specified "in an interpreter", as I think it's the natural way to implement an SML-like match construct in that context.

EDIT: Perhaps I am missing Elixir-specific context and I apologise if this is the case. It was the unification that made me click this post.

ReleaseCandidat · on July 1, 2024

Well, I'd do that with "real" interpreters (using bytecode) too. But on the other hand also _especially_ for "toy" tree walking interpreters, to learn how to build the pattern matching tree -- as unification is needed for the type checker anyways ;). Although I would start by reading Wadler's version (that I linked in the other post), it's IMHO a bit more accessible than Peterson's https://www.classes.cs.uchicago.edu/archive/2011/spring/2262...

But please don't get me wrong: I'm just saying that there exist other, specialised and (well, generally) more performant algorithms for pattern matching and people should have at least heard about them. If somebody decides to use unification for pattern matching, that still could be changed if performance or better error messages or ... matters.

contificate · on July 1, 2024

Ah, I thought your username was familiar: you recommended Wadler's approach on a previous HN thread concerning my blog post about Pettersson/Maranget's algorithm (https://news.ycombinator.com/item?id=39241776). Yeah, I admit that Pettersson's telling of the algorithm is a bit turgid (probably why Maranget's paper - which is the same technique as its core, at least to my understanding - is cited more often).

contificate · on Feb 4, 2024

This is my blog, thanks for posting - never expected to see such engagement.

autumn-antlers · on Feb 4, 2024

Thanks for writing! Love the composition of your site pages btw. I've got the same "|"-seperated nav bar on my WIP branch. Cleaner then my current branch, but I'm trying to compromise between sparse-ness and something with more "vibes". If only I was in the habit of actually writing anything-- congats!

contificate · on Feb 4, 2024

Thank you very much! Your website's landing page is very clean.

contificate · on July 22, 2022

Agreed, union-find is great.

My favourite usage of it in practice is for solving first-order (syntactic) unification problems. Destructive unification by means of rewriting unbound variables to be links to other elements is a very pretty solution - especially when you consider the earlier solutions such as that of Robinson's unification which, when applied, often involves somewhat error-prone compositions of substitutions (and is much slower).

The fact that the forest of disjoint sets can be encoded in the program values you're manipulating is great (as opposed to, say, the array-based encoding often taught first): makes it very natural to union two elements, chase up the tree to the set representative, and only requires minor adjustment(s) to the representation of program values.

Destructive unification by means of union-find for solving equality constraints (by rewriting unbound variables into links and, thus, implicitly rewriting terms - without explicit substitution) makes for very easy little unification algorithms that require minimal extension to the datatypes you intend to use and removing evidence of the union-find artefacts can be achieved in the most natural way: replace each link with its representative by using find(l) (a process known as "zonking" in the context of type inference algorithms).

Basic demonstration of what I'm talking about (the incremental refinement of the inferred types is just basic union-find usage): https://streamable.com/1jyzg8