Large heaps and large pause times do not necessarily go hand in hand. That's a q...

fauigerzigerk · on Oct 12, 2013

>Large heaps and large pause times do not necessarily go hand in hand

Not necessarily, but in practice they do go hand in hand. I didn't say the problem was impossible to solve, just that it is an important problem that needs solving. Azul solved the technology issues (or so they claim) but not the economics of it, and they solved it for a language that isn't a good fit for in-memory computing in the first place (to put it politely).

If I have to write software today that keeps a lot of data in-memory and requires reasonable latency (and I do) my only realistic option is C++.

I know all the drawbacks of naive reference counting. C++ shared pointers are a horrible kludge. Fortunately they are only needed in very few places, thanks to RAII and unique_ptr. The problem is that C++ has tons of other issues that will never be solved (antiquated modularity, header files, crazy compile times, excessive complexity and generally lots of baggage from the past).

rbehrends · on Oct 12, 2013

If I have to write software today that keeps a lot of data in-memory and requires reasonable latency (and I do) my only realistic option is C++.

I don't necessarily have a fundamental disagreement here, but I offer two caveats (one of which you may consider disagreement at a certain level).

One is that a lot of the discussion in this thread is not about what is possible today, but about where language implementations can realistically go.

The second is that there are language implementations that do allow you to keep lots of data in memory and still give you low latency; the catch is that most of them are functional programming languages and do not tie into the ecosystems of C++, Java, etc. which limits their applicability. But consider for example, that Jane Street Capital, which has pretty significant requirements for latency, is using OCaml. This is in part because OCaml has an incremental garbage collector for major collections (in addition to generational garbage collection). As I said, it's not rocket science.

The same goes for Erlang, which uses a model of thread-local heaps. Erlang heaps tend to be small (lots of threads [1] with little data each, even though the total memory footprint may be in the gigabytes), so Erlang can use a straightforward garbage collector that can run concurrently with other threads, does a complete collection fast (because the entire heap fits inside the L3 or even L2 cache) or can avoid collection entirely (if you specify the heap size explicitly with spawn_opt and no collection is needed before thread termination). As a result, Erlang can easily satisfy the low latency requirements for telecommunications (where its being used primarily).

Functional programming languages just had an unavoidable need for garbage collection for half a century now and so implementations of functional languages have seen a lot more GC tuning than imperative languages. Thus, you do at least in theory have other options, but I expect "realistic" in your case also implies being able to tap into certain existing software ecosystems.

Let me finally note that a big problem has arguably been too much focus on the JVM; not that there's anything wrong with supporting the JVM, but it has too often come at the cost of alternative execution models. JIT compilation, lack of value types, a very general threading model, loss of static type information, etc. all can very much get in the way of tuning automatic memory management. Luckily, some of the newer emergent languages target alternative backends, in part to avoid these problems.

[1] Technically, Erlang uses the term "process" in lieu of "thread"; I'm using "thread" to avoid confusion with OS-level processes.