Rust Faster

geofft · on Oct 4, 2015

Fascinating things I learned from this article:

1. The benchmarks game is mostly a game of who can solve the problem fastest with the constraint of "all the code has to be in this language." It's not about how anyone would realistically write the code; if ludicrous optimizations were in-scope for any real-world project, so would be calling out to a different language. It's worse than the usual problem of benchmarks being unrepresentative of real code; the implementations are also unrepresentative.

2. Apparently performant Haskell involves unchecked raw memory access? What's the story there?

jacobolus · on Oct 4, 2015

All of these cross-language benchmarks inevitably end up like this.

They tend to tell you as much or more about the cleverness/knowledge/available free time of the people writing the specific implementation in each language as they tell you about the languages’ inherent differences.

I don’t think it’s a solvable problem though, as there’s no perfect definition of “how anyone would realistically write the code”. Realistically, people are going to write the first naïve implementation they can think of until it becomes a performance problem and someone has time/resources to throw at speeding things up. In the case that a piece of code is a bottleneck and performance matters enough to devote engineering effort, folks will start doing all kinds of crazy “unrealistic” optimizations to real-world code.

fauigerzigerk · on Oct 4, 2015

I think the benchmarks would tell us much more if they compared implementations of text book algorithms (and data structures) instead of comparing solutions to text book problems.

I realize that this would not necessarily make the code any more realistic or idiomatic. On the contrary, it could make the code extremely contrived for some algorithms in some languages. Some combinations might even be unimplementable.

But I don't care about that, because, as you say, what is and isn't idiomatic is very subjective and it gets thrown out anyway once we decide that some piece of code does need to run fast.

What makes these benchmarks pretty useless is that you don't know whether the choice of algorithm or the choice of programming language dominates a particular result.

titanomachy · on Oct 4, 2015

Well, you can learn something if you read the code at least.

igouy · on Oct 4, 2015

>>implementations of text book algorithms<<

You can do that comparison and publish the results.

>> choice of algorithm or the choice of programming language<<

Do the n-body programs implement different algorithms?

http://benchmarksgame.alioth.debian.org/u64q/performance.php...

fauigerzigerk · on Oct 4, 2015

Sorry, my mistake. I thought that they did not mandate one particular algorithm, but apparently I was wrong. They say:

"We are trying to show the performance of various programming language implementations - so we ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result."

Thanks for the correction.

igouy · on Oct 4, 2015

Well, "we ask" is not the same as "we receive" :-)

Also, because of the differences between such a wide range of programming languages, there has to be some slop and flexibility in the way the programs are allowed to be written -- otherwise it really would be C written in Haskell.

http://benchmarksgame.alioth.debian.org/dont-jump-to-conclus...

kbenson · on Oct 4, 2015

Well, yes. Another way to look at the benchmark game is "if I'm not using a fast low-level language, theoretically what speed gain might I expect if I wrote a portion of my project in X language and called out to it (ignoring data marshaling)?"

It would be interesting if entries were categorized into idiomatic and non-idiomatic representations of that language, but that would also require people agree on what is idiomatic for that language, which is easier said than done in a lot of cases.

jamii · on Oct 4, 2015

If you just look at the 'best' numbers for each language and treat it as a fight, it's pretty useless. But if you are evaluating a new language it can be very educational to look at all the solutions for that language and see the range of tradeoffs between speed, space and readability. The short and simple programs show you what kind of performance you can expect from straightforward, idiomatic code and the crazy optimised versions show you how far you can push the performance bracket and what you have to sacrifice to get there (performance, safety etc).

igouy · on Oct 4, 2015

If we just look at the 'best' numbers for each language and treat it as a fight -- we've chosen to do something pretty useless ;-)

qznc · on Oct 4, 2015

> the constraint of "all the code has to be in this language."

No that is not a usual constraint, many implementations use third party libraries. E.g., regexdna implementations usually rely on some regex library and pidigits just measures libgmp.

steveklabnik · on Oct 4, 2015

The "third party" bit is hard to define as well. For example, Rust doesn't include regular expressions in the standard library, because we have made an explicit choice to keep the standard library small. But the regex package is maintained by the Rust team, so it's not _exactly_ third party. But some languages with regexes in the language just have bindings to PCRE or something similar... so it's "first party" in the sense that it's built-in, but "third party" in the sense that the code wasn't really written and isn't maintained by them.

mcguire · on Oct 4, 2015

It is a valid constraint, though. Many long-standing complaints from the Java side of things were along the lines of "We can't use libgmp."

igouy · on Oct 4, 2015

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...

igouy · on Oct 4, 2015

>>ludicrous optimizations<<

If only you'd look at the code.

There can be several quite different programs for the same language implementation:

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...

Everlag · on Oct 4, 2015

I've been working with rust for about a week now and I've found it a mirror of go in several ways.

Go aims to be simple with the aim of being 'easy' to start writing idiomatically within a few days of jumping into the language. Comparatively, rust is a behemoth in terms of complexity.

There's the completely relevant compilation speed difference: if rust could compile within 5-10x the time go does, it would be much more pleasant to work with. Being able to compile within 2 seconds on large projects is crazy for iteration speed; having to wait upwards of 10 seconds on toy projects is not.

However, rust is stupid fast while also being safer than go. Also, generics; that's a flamewar for another day.

Context: I've been working with go for around 2-3 years now. I recently decided to pick up rust because of the guarantees it provides along with the crazy performance.

kibwen · on Oct 4, 2015

You'll be happy to hear that compilation speed is the foremost priority of the Rust developers at the moment, and is the result of a technical-debt-laden codebase (having survived four years of traumatic language evolution) rather than any intrinsic property of the language. :) Huge improvements are in the pipe.

IshKebab · on Oct 4, 2015

Do you have any evidence that it isn't an intrinsic property of the language? I would have thought all that type inference is non-trivial to calculate.

Manishearth · on Oct 4, 2015

The slowest part is the LLVM passes. typecheck isn't very fast either, but it can be parallelized and it's much faster than the LLVM part.

We don't really do any optimizations on the Rust side and just hand over IR to LLVM. This IR isn't of the type LLVM was built to deal with (pervasive noalias, lots of iterators, etc). It can optimize it, just that it's slow. We can work on giving better IR there.

So type check is slow, but it isn't the reason we're slow. (Still, improvements to type check will help)

kibwen · on Oct 4, 2015

You don't even need to trust me, use the compiler's own "time-passes" flag to see a breakdown of where the compiler spends its time on any code you wish to feed it.

Everlag · on Oct 4, 2015

Fantastic to hear! I don't even want to think about the horrors that 4 years of heavy core changes would do to a language.

kibwen · on Oct 4, 2015

The bottom line is that self-hosting is a great way to shake down your language on a large-ish codebase and a fine way of encouraging outsider contributions, but should really only be done once you actually have a relatively solid idea of what your language is going to be. :)

grayrest · on Oct 4, 2015

> if rust could compile within 5-10x the time go does, it would be much more pleasant to work with

As someone who's been around Rust for a long time but writes very little code in the language, I'm looking towards the MIR work and language services effort to improve this. The language is reasonable to work with once you're over the learning curve and you can mentally visualize the allocations/pointers/borrows moving around but the learning curve is really hard. Having a faster feedback loop on a higher information density interface than console text (I have a dream of dynamic borrow/lifetime overlays) to act as an extended tutorial would go a very long way.

kibwen · on Oct 4, 2015

For those playing along at home, the "MIR" referenced here is the effort to entirely overhaul the compiler middle-end to enable more reliable code transformations and trickier analysis passes, as well as laying the groundwork for goodies like incremental compilation (https://github.com/rust-lang/rfcs/pull/1298).

steveklabnik · on Oct 4, 2015

To elaborate on the 'tricky' here, it's not even that we want to add more _complex_ passes, exactly. It's that many of the current passes are tricky to implement when you operate on an AST, and so we'll be able to make them more solid, eliminating bugs in the process.

There is one pass that is certainly "more tricky" from an algorithms perspective, but will result in the rules being more intuitive from a user's perspective.

CyberDildonics · on Oct 4, 2015

> (I have a dream of dynamic borrow/lifetime overlays)

This is really what needs to happen for wide Rust uptake. If you look at IRC most people starting with Rust immediately run into compile problems that they don't know how to solve.

I did and I write C++11 with rval references all day every day.

markus2012 · on Oct 4, 2015

What helps today is skipping the translation phase (syntax checking only) using rustc -Z no-trans.

This makes the edit/fix syntax cycle 20x faster.

A way to configure this in vim: https://www.reddit.com/r/rust/comments/3kc6s7/fastest_editco...

MichaelGG · on Oct 4, 2015

Rust's complexity is pretty much directly related to the problem it solves (no GC, memory safe, high perf). And even then it's sorta straightforward given those constraints.

I'd bet that if they fixed up type inference to work everywhere (like Haskell or even like an ML) that'd reduce the perceived complexity as beginners wouldn't need to figure out type annotations by hand all the time.

IshKebab · on Oct 4, 2015

I don't think that would help very much. You still need to know the types, even with type inference. Type inference just helps avoid excessive typing (of the keyboard kind).

Chattered · on Oct 4, 2015

Not generally true. I'd have given up on Haskell yonks ago if that's all type inference was good for. In many cases, I just can't figure out the types of arguments without stopping completely and thinking really hard, and I usually can't be bothered to do that when I have a compiler that's happy to do it for me. Haskell gets bonus points for allowing me to insert holes arbitrarily in my code and ask the type-inferencer "what sort of thing do I need to put there?" Inference of the types of function arguments is a special case of that.

It might not be such a big-deal in Rust. I don't have enough experience with it. But if it ever gets HKTs, it wouldn't surprise me if their interaction with typeclasses becomes a pain-point without (nearly) global type inference.

MichaelGG · on Oct 5, 2015

You don't need to know the type. If you take a parameter, then pass it to another function, you don't need to know how that function specified it (if it was a mutable borrow, or full ownership or whatever). It'd just flow through. Or when using an argument and unsure which constraints I need, it'll fill those in for me.

I write some in F# and I never specify types unless there's an obvious clarity issue or a type inference limitation. Most of the time that's because type inference in F# flows only one way and the whole .NET instance methods don't offer alternative syntax (i.e., you must write x.Foo, and you can't do that without knowing x's type - it won't get inferred).

When I don't know the type, then I ask the IDE or REPL what it is. Problem solved; everyone happy.

This is more of a problem when writing short functions (it increases the overhead both timewise and resulting noise) or when dealing with complicated types. I've dealt with types requiring 300+ character annotations. That's no fun. Or particularly when there are certain traits or other constraints that you haven't fully worked out. Making you figure them out on paper and write them down doesn't benefit anyone - they're required for a real reason and the compiler will always check and let you know.

Manishearth · on Oct 4, 2015

> I'd bet that if they fixed up type inference to work everywhere

You mean on function parameters and stuff? Never going to happen, and most of us would not count that as a "fix". If anything that makes programs harder to read, since tracking a type becomes much, much harder.

MichaelGG · on Oct 5, 2015

I know the Rust team is ideologically opposed to it. It would ease the barrier to entry, as well as making Rust more amenable to shorter programs. It also provides more of an elegance, making items no different than expression bindings. Most importantly, it doesn't oppose the opinion that humans should do a computer's work. After all, Rust could always provide a mode to infer then write the type annotation out. Hopefully an IDE will do this.

kibwen · on Oct 5, 2015

Explicit function signatures are a feature that favors correctness over ergonomics. This isn't the first place that Rust makes this tradeoff: immutability itself is entirely useless in Rust except as a means to to aid programmer correctness (to wit: as a result of tracking ownership, Rust could trivially infer immutability based on usage patterns, but when the devs hinted at the idea of removing immutability the community rallied in opposition (the legendary "mutpocalypse")).

Furthermore, the forthcoming work on incremental recompilation intends to leverage the fact that function signatures are infallible in order to enable drastically improved incrementalism (no need to even consider recompiling code that calls a function if only the body of the called function changes), and the same fact will be leveraged to enable huge parallelism in incrementally compiled units.

Furtherfurthermore, Rust deliberately doesn't employ Hindley-Milner and so I'm not even 100% certain that whole-program type inference is feasible.

igouy · on Oct 4, 2015

>>having to wait upwards of 10 seconds on toy projects<<

meteor-contest Rust #2, 545 lines of code, 4.75s to build

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...

I do see what you mean.

kibwen · on Oct 4, 2015

Grossly misleading. You're measuring the build time of an optimized build, which is not what developers are building during routine development. Optimized builds deliberately trade compilation speed for superior performance.

Jweb_Guru · on Oct 5, 2015

It's not misleading, because Rust focuses so little on -O0 performance that it's often not plausible to use a Rust build with that optimization level. We can't have it both ways.

igouy · on Oct 4, 2015

fwiw

    $ time /usr/local/src/rust-1.3.0-x86_64-unknown-linux-gnu/rustc/bin/rustc meteor.rs -o meteor.rust-2.rust_run

    real	0m2.042s
    user	0m1.940s
    sys		0m0.096s

kibwen · on Oct 4, 2015

And most of that is constant time overhead of linking and trans, rather than linearly scaling overhead. :P

igouy · on Oct 4, 2015

fwiw thread-ring Rust, 38 lines of code

    $ time /usr/local/src/rust-1.3.0-x86_64-unknown-linux-gnu/rustc/bin/rustc threadring.rs -o threadring.rust_run

    real	0m1.466s
    user	0m1.340s
    sys		0m0.112s

kibwen · on Oct 4, 2015

Yes, thank you for proving my point while I'm away from my computer. :)

igouy · on Oct 4, 2015

I don't have an-axe-to-grind :-)

acconsta · on Oct 4, 2015

The author also points out that some of the benchmarks poorly represent real workloads:

"Bottom up (since the worst offenders are now first),

binary-trees is silly since it measures allocation speed for a case that simply doesn't exist in real code;

thread-ring is basically insane, since nobody ever bottlenecks like that;

chameneos-redux's C++ implementation is ridiculous. The C is not so ridiculous, but you still have the problem that basically every language in the top few spots does something completely different;

pidigits tests whether you have bindings to GMP;

regex-dna tests a regex engine on a small subset of cases (arguably the first half-acceptable benchmark);

k-nucleotide tests who has the best hash table for this particular silly scheme, and they don't all even do the same thing (eg. Scala precompacts, like my new Rust version);

mandelbrot is kind'a OK;

reverse-complement would be kind'a OK if not for a few hacky implementations (like the Rust);

spectral-norm is kind'a OK;

Haskell basically cheats fasta (which is why I copied it);

meteor-contest is too short to mean anything at all;

fannkuch-redux is probably kind'a OK,

n-body is kind'a OK.

So maybe 5/13 are acceptable, and I'd still only use 4 of those. I think if looking at mandelbrot, spectral-norm, fannkuch-redux and n-body you can argue the benches are a reasonable measure of peak performance. However, these cases are also all too small and simple to really be convincing either, nor is it particularly fair (where's NumPy for Python?)."

https://users.rust-lang.org/t/blog-rust-faster/3117/12?u=acc...

igouy · on Oct 4, 2015

>>bad at simulating real workloads<<

Have you looked at the benchmarks game website?

Please show where the benchmarks game website claims that those tasks simulate "real workloads" (whatever that means).

You will see "Your application is the ultimate benchmark" and "These are just 10 tiny examples" and …

http://benchmarksgame.alioth.debian.org/dont-jump-to-conclus...

acconsta · on Oct 4, 2015

I apologize if I gave the impression I was claiming otherwise. It's just context for the discussion.

igouy · on Oct 4, 2015

Veedrac "points out" his likes and dislikes, that doesn't mean his likes and dislikes are "The Truth".

When Veedrac dismissively tells you - "pidigits tests whether you have bindings to GMP" - you should ask why he hasn't told you that the measurements can be different even when all the programs use GMP; you should ask why he hasn't told you that the measurements also show the difference for the same language implementation when programs do and don't use GMP.

etc etc

techdragon · on Oct 4, 2015

Every time I look at rust and posts about rust like this one. It occurs to me that "I can use this to write 'C' libraries" and a variable number of moments later, I think to myself that I'll keep Go at the bottom of my list of languages to learn to program in, after Ruby, Lisp, Fortran, COBOL, and Intercal, I'll get to Go one day, if it's still relevant to me, for now Rust + Erlang / Elixir on FreeBSD is like having my own personal unicorn.

I've been meaning to work on a proper "Erlang/OTP-ish" framework for Python for a long time, Pulsar[1] is a good start but needs more developers, and more documentation in order to grow. It has an example that does web sockets with normal Django, no massive hacks. (Which someone more familiar with Django should really check out and talk about more widely) But no Flask example?

1 - https://github.com/quantmind/pulsar

mcguire · on Oct 4, 2015

"a proper "Erlang/OTP-ish" framework"

Rust's original M:N threading almost begged for an OTP-style approach (as in, it looked like Erlang with a non-Prolog syntax :-)). But I think the current threading interfaces are pretty welcoming for an OTP style. Anyone working on that?

fijal · on Oct 4, 2015

"A staple of all performance discussion is the great Computer Language Benchmarks Game." this is so sad, but true

anp · on Oct 4, 2015

Very cool, and definitely gives me some insight into those benchmarks. Which makes me wonder -- are there benchmarks for "boring" programs in a variety of languages? I'm generally more interested in the execution speed of implementations that I would actually have time to write when on a deadline.

kibwen · on Oct 4, 2015

The best thing for your purposes would be a site that attempted to optimize for the appearance of idiomatic code, which you could then secretly use to gauge the performance of code that's idiomatic to each language. In other words, entirely remove performance showboating as an incentive. Rosetta Code (http://rosettacode.org/wiki/Rosetta_Code) may be the closest to what you want, though I can guess that most of the Rust implementations were written prior to 1.0 and won't even compile anymore, let alone represent idiomatic code. :P

anp · on Oct 4, 2015

I like that idea -- although I would agree that Rosetta Code often does not represent idiomatic code, even for longer-established languages. The wikibooks algorithms book might also be a good candidate, although I've not yet seen Rust on there.

gus_massa · on Oct 4, 2015

Many languages have a few programs for each benchmark. You can try looking at the shortest one.

Most submitters try to optimize for speed, and to use smart tricks you need more LOC, so usually the shortest program is the more straightforward.

Actually, in this site they don't measure the size of a program in LOC, they use the size of the compressed program that is a better metric.

(It may be different in other sites specialized in code golf, because in the those sites the submitters are specializing for size specifically.)

Klasiaster · on Oct 4, 2015

As you can see here, code amount is also measured: http://benchmarksgame.alioth.debian.org/u64q/compare.php?lan...

igouy · on Oct 4, 2015

If only you'd look at the code, you might get some insight.

You might see some boring programs.

You could always contribute some boring programs ;-)

mcguire · on Oct 4, 2015

"k_nucleotide is a test of how fast your Hash map is."

Speaking of Rusts' HashMap, the Robin Hood map is pretty darn sweet (and I say that as someone who translated a Pythonish map to Rust), but the last time I looked it was still throttled by SipHash.[1] Is there any progress on SIMD-ing that?

[1] "Safety vs. performance", etc., etc.

pcwalton · on Oct 4, 2015

SipHash as implemented in Rust is actually pretty fast for long keys; it's only for short keys that it falls down (and SIMDification won't help too much on short keys).

mcguire · on Oct 5, 2015

What's a "long key" in this case? I've only looked at it for modest-length strings.

imaginenore · on Oct 4, 2015

The requirement to read line by line is ridiculous.