More

smarr · on Oct 21, 2023

With the amount of work that's needed to even attempt such a study and get to a point that one can precisely define which variable one is actually measuring, there are indeed compromises needed.

But please feel free to replicate the study for your preferred language. I am happy to discuss more about why we made certain choices.

smarr · on Oct 21, 2023

Please see Figure 2 in the blog post https://stefan-marr.de/2023/10/ast-vs-bytecode-interpreters/... or Sec. 5.2 in the paper https://stefan-marr.de/papers/oopsla-larose-et-al-ast-vs-byt...

We report on results for both, with and without just-in-time compilation. The specific focus for this work was pure interpreter performance in the context of metacompilation systems, but before compilation had a chance to kick in.

For both RPython and Truffle/Graal, it's possible to disable the JIT compilers and measure pure interpreter speed.

tomp · on Oct 21, 2023

Thanks for the clarification!

So the "baseline" is Java - is that Java compiled or interpreted? And if the latter, is the non-JIT-ted Graal interpreter compiled (as Java) and interpreting the script, or is it interpreted itself?

smarr · on Oct 22, 2023

> is that Java compiled or interpreted?

The figure for the JIT-compiled numbers uses a standard HotSpot JVM, with JIT compilation. The figures for the interpreter numbers uses a standard HotSpot JVM with the -Xint flag, so, only using the Java bytecode interpreter.

The TruffleSOM interpreter is AOT-compiled, so, it's a native binary, which is then interpreting the SOM code.

smarr · on Oct 21, 2023

Please see Figure 2 in the blog post https://stefan-marr.de/2023/10/ast-vs-bytecode-interpreters/... or Sec. 5.2 in the paper https://stefan-marr.de/papers/oopsla-larose-et-al-ast-vs-byt...

We report on results for both, with and without just-in-time compilation. The specific focus for this work was pure interpreter performance in the context of metacompilation systems, but before compilation had a chance to kick in.

You are free to disagree with the specific design choices of the AST and bytecode interpreters for SOM, but we put quite some effort in to have an as fair as possible comparison.

smarr · on June 19, 2023

Yeah, the serve seems to be temporarily down. The paper can also be found here: https://tugawa.github.io/ejs-comlan-preprint.pdf

smarr · on Aug 18, 2016

A few of the smaller benchmarks have been ported to C++ already: https://github.com/smarr/are-we-fast-yet/tree/wip/cpp/benchm...

But the bigger ones are not yet (so, no results yet).

One of the important questions to answer first is how the mapping should be done. A naive version using new/delete/smartpointers is going to have performance issues. Other options would be to use arena allocators and completely remove memory management overhead from the equation. Depending on what comparison/C++ usage scenario is desired, both options would be useful.

smarr · on Sept 21, 2015

I tried Pandoc before reverting back to tex4ht. Unfortunately, it models a rather small subset of the things I was interested in. Specifically around the typesetting of citations and listings, as far a I remember. So, tex4ht and HTML post-processing it was.

smarr · on Sept 21, 2015

It is built on top of tex4ht. It provides merely a few settings for tex4ht and post-processing scripts that beautify the generated HTML. You might ask why post-processing? Well, because it was simpler for me than figuring out how to get tex4ht to do the desired thing. I just find Tex/Latex not pleasant to use as a programming language, but that's personal taste.

xorcist · on Sept 21, 2015

If the post-processing stage is useful to others, perhaps it could be upstreamed into tex4ht?

(I sometimes think that the user interface of github puts too much emphasis on cloning and not enough on cooperation. Many useful tools ends up in a dozen forks, all with slightly different features, all equally inactive.)

smarr · on June 19, 2015

Yes, that's indeed a none-obvious issue, and looks rather strange on the graph.

That's not the 'partial evaluation' per se. Instead, it is the difference between RPython and HotSpot that surfaces here. RPython does currently generated singled threaded VMs, and neither the GC nor the compilation are done in separate threads. The HotSpot JVM however can do those things in parallel and additionally has other infrastructure threads running. In the end, this increases the likelihood that the OS reschedules the application thread, which becomes visible as jitter.

pron · on June 19, 2015

Oh, makes sense :) Is that mentioned in the paper and I missed it? If not, it might deserve a footnote, as the difference is glaring.

BTW, would Graal really become available as a stock-HotSpot plugin in Java 9 thanks to JEP 243? I see things are ready on Graal's end[1], but are they on HotSpot's?

[1]: https://bugs.openjdk.java.net/browse/GRAAL-49

smarr · on June 18, 2015

You are right, with a tracer, you get most of that for free. For a meta-object protocol, you'll need however still a little bit help to avoid creating too many guards. And, the PICs or `dispatch chains' are very useful in the interpreter, which avoids enormous warmup penalties.

smarr · on June 18, 2015

That's a good summary, thanks. Glad that all that came across :)