Integrating with LLVM

bjourne · on Nov 6, 2014

One big downside with llvm as a compiler backend is that it does not support accurate relocating garbage collection, see: http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcro.... I believe both Dylan and Julia uses conservative gc which is not at all as efficient as relocating gc.

BruceM · on Nov 6, 2014

Well, if you're aware of that blog post from February, then you should also be aware of all of the work that he and his team have done since then. They're working on solving the problems with LLVM and more advanced GC.

As for Dylan, Dylan can use Boehm (and does for now in the C and LLVM backends), which is conservative.

However, Dylan can also use MPS (http://ravenbrook.com/project/mps) which is a much more advanced GC that does copying/compacting and all that. Dylan uses MPS with the compiler backend that generates native code.

The plan is that once the LLVM backend is up and running using Boehm, efforts will be made to get it working with MPS.

Julia uses a home-grown GC, but I'm not familiar with it or its characteristics at all.

bjourne · on Nov 7, 2014

Yes I know they are working on solving the issues I've mentioned, but I don't think they are there yet? I wasn't aware of MPS, it looks like a very interesting garbage collector. Do you know if it supports inline bump allocation (no function calls)? I skimmed the docs but couldn't find the answer. The language I'm contributing to (Factor) does that by keeping the nursery pointer in a register so you just ADD to it to allocate memory.

To bad about the restrictive license. Is that why there is the Open Dylan exception in it?

BruceM · on Nov 7, 2014

MPS is designed so that allocation can be very fast in the common case as long as the interface to the collector in the compiler and run-time is done well.

They are willing to talk to people about the licensing. They're nice folks, although often pretty busy. CLASP supports using both MPS and Boehm as well.

As for the Open Dylan exception ... yes. But the other thing to understand there is that both MPS and what is now Open Dylan were developed at Harlequin in the 1990s and MPS's original "client" was Open Dylan.

bjourne · on Nov 7, 2014

So if another compiler wanted to use MPS in their runtime, they couldn't, unless binary distribution of programs compiled with that compiler was forbidden? I'm not a license wonk by any means, but it seems to me that programs compiled with CLASP themselves need to be gpl:ed since CLASP itself is.

klipt · on Nov 7, 2014

> it seems to me that programs compiled with CLASP themselves need to be gpl:ed since CLASP itself is.

Why? GCC is GPL'd but plenty of people use GCC to write non-GPL code...

bjourne · on Nov 7, 2014

https://www.gnu.org/licenses/gcc-exception-3.1.html

simonster · on Nov 6, 2014

I don't know about Dylan's implementation, but (at present) Julia uses a non-moving, non-generational accurate GC with a shadow stack, which works but doesn't perform all that well. There is some work underway to make it generational (https://github.com/JuliaLang/julia/pull/8699). However, JSC (which uses LLVM for the FTL JIT) implements a moving generational GC with a Bartlett collector, which is not fully accurate (it is accurate for heap objects but uses a conservative stack scanner) but evidently yields good performance. See https://www.webkit.org/blog/3362/introducing-the-webkit-ftl-... for further details. I don't think either of these approaches need explicit compiler support.

pygy_ · on Nov 6, 2014

For Julia, I believe a moving GC would clash with the FFI.

BruceM · on Nov 7, 2014

This is something that can be readily solved. Dylan has no issues in this area when using the MPS (http://www.ravenbrook.com/project/mps/) GC.

There are a few things that are worth taking note of though ...

One is that things that use addresses in memory need to be able to deal with those addresses changing. An example of this are common implementations of hash tables (due to hashing using the address). MPS provides location dependencies (http://www.ravenbrook.com/project/mps/master/manual/html/top...) to deal with this.

Another is that when you call into C / foreign code, you want to be able to pin your object down so that it won't move. This is commonly an issue with byte vectors / strings. For that, the language / libraries / compiler should support pinning and unpinning objects. The way that Dylan does this in the native code generator is that it makes sure there's a stack reference to the object which is enough for the GC to not move it. This is only good for short-lived things though as you don't want to disrupt GC for too long.

Another thing to take note of is that you sometimes want to store an object reference in native code and out of reach of the GC. A common situation where this happens is storing user data or callbacks. In this situation, we can register a Dylan object to have a handle which we can pass to native code. With this, we register the object, then export it to get a handle, we pass the handle to native code. When we get a handle from native code, we import it to get back to the original Dylan object (which may have moved). We can unregister it when we're all done.

    register-c-dylan-object(handle);
    %uv-handle-data(handle.raw-handle) := export-c-dylan-object(handle);

    ...

    let handle = import-c-dylan-object(%uv-handle-data(raw-handle));
    apply(handle.callback, args)

These are all solved problems. They can be a bit tedious and sometimes error prone, but nothing that can't be solved. It might be adventurous to migrate an existing community that wasn't prepared though.

hajile · on Nov 7, 2014

Was there ever any consideration of the rubinius JIT? It seems that for a somewhat small community like Dylan has, pairing up with other small groups to make and improve a shared JIT would be a better use of resources.

BruceM · on Nov 7, 2014

Interesting question. I would say that the answer is "no, the rubinius JIT was never considered" ... One thing is that we don't just JIT compile code. In fact, we only do on Windows when under the debugger / REPL. We do AOT compilation, for better or worse. LLVM is a pretty nice match and gives us a lot of good things, including DWARF debug info.

qznc · on Nov 6, 2014

What approach do others use? API or Bitcode? The pro and contra discussion is quite short in the article.

BruceM · on Nov 6, 2014

Well, I'd mainly written it for people who were (already) asking me about how we differed from other integrations or why we did what we did ...

LDC (D with LLVM) is written in C++, so they invoke LLVM APIs, but I don't know what they generate (bitcode or machine code).

Julia is written in C++, they invoke LLVM APIs, they call the JIT.

CLASP is written in C++ and Common Lisp, they invoke LLVM APIs, but I'm not sure if they only JIT or if he stores anything to disk yet.

And so on ...

jevinskie · on Nov 7, 2014

I like the C++ ABI though it is in constant flux. The C API is stable but just too limited. Generating bitcode yourself is probably a pretty good strategy too, if your library is good.

thope · on Nov 6, 2014

"You are tied to a particular version of the LLVM API at compile time."

In what pretty terms these thoughts are put!