Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Part 2 Dart vs Go vs Python (and PyPy) Performance (hackingthought.com)
49 points by cd34 on April 11, 2012 | hide | past | favorite | 38 comments


I actually did a very similar test to this recently and got similar results (though mine was just between Go and Python)... in retrospect the reason is obvious: in python I was using python-cjson which is a heavily optimized C implementation of a json parser, while in Go I was using a 100% Go implementation.

In a certain way this shows nothing, since we all know C is fast. But in another way it convinced me that Python is actually fine for most of what I'm doing since it's easy to drop in to C for parts that need performance. And even more importantly there are often already already C libraries with python bindings that do the thing I'm looking for (ie parsing, numpy, etc).


That has always been what has fueled my love of Python, I am a C/C++ programmer so being able to replace Python with C++ so easily is a real joy.


The usual benchmark complaints: You're not really comparing language performance, because the Python version is using an optimized C library.

But isn't that the deeper point?

Program in something that lets you optimize easily (or use optimized libraries easily). That's a significant, real-world performance advantage.

While it is a virtual certainty that Go will get a fast JSON library, Python's broad, mature library support is very helpful.


It's interesting that Python has a C library for this, which indicates that someone thought Python couldn't cut it for this task. Yet the PyPy guy (in the comments) claims that with a bit of tweaking they got the pure python JSON encoder working faster than the C library (plus python/c interface overhead I assume) and that the same could probably be done for decoding too.

Python's USP seems to be the ability to drop down to C or C++, PyPy's is that it's fast enough to keep everything in high-level Python (at a cost in memory) and Go's is that it's low-level enough that you can do the C level optimisation in Go (though as yet they haven't for this particular case).

I think the time for the traditional Python approach has passed as newer technology allows you to be fast enough for many tasks without leaving the language and even Go is aimed at a fairly low level. PyPy is cool but is somewhat chained to assumptions in Python (though this lets them build on that wide legacy). So it makes me wonder what the new Python/Ruby/Perl is going to be? Possibly something written with PyPy's toolchain to get a JIT? Do they have a here's what we can do if we get to rewrite the rules to suit our tools language in the PyPy family? What are the other contenders? All the ones that spring to mind are rewrites of existing languages.


Something like Julia, Rust or Clay that are high level languages but with little more than Python-like simplicity that generate LLVM code seem powerful. This general model is the closest to a viable C-replacement I have seen yet.

In addition, personally, I'd much rather invest my time learning above the flexibile and fully open LLVM technologies than above Google's proprietary little-better-than-Java constrained language.

For current practical purposes though, Python/C combo is more than sufficient for many needs.


Does Python let you optimize easily? Not in my experience, unless you define "easily," as pushing the critical path into C libraries. (Not an unreasonable approach, just not one I'd call "easy," and then you're not using Python anymore.)

Go does give you the tools you need to profile, understand, and then fix performance issues without switching languages.

More detail: http://blog.golang.org/2011/06/profiling-go-programs.html


Profiling in Python isn't hard either. But yeah, you really can't gain the performance of statically typed language in pure python (well, maybe with pypy). However, pushing the critical path into C library doesn't have to be as horrible as it sounds (for python developer at least). Cython[1] project can be very helpful. You just annotate critical variables/functions with types and compile the now cython code into C. This C code will be pure C as you wrote it in your typed parts, and bunch of ugly (but commented) calling of Python libraries in pure python parts. Then you just throw it at gcc and you are done. The best thing is that you can mix the typed/untyped code and pass variables around as it was pure python. Example from docs: http://docs.cython.org/src/userguide/tutorial.html#primes

1. http://cython.org/


This isn't a benchmark of the languages, but of the libraries that ship with them. That's fine, but be careful not to confuse that with a language benchmark.


Note that where you mean Python you pretty much mean this optimized C implementation: https://bitbucket.org/cmoyer/python-twitter/src/5bba928d8c12...


Yeah, profiling the Go quickly shows that the part that takes the most time is the JSON parsing.


Given Go is a compiled language I am rather surprised at its low speed in this benchmark. I am not familiar enough with the language to take a look at the benchmark code, has anyone else done so?

I can't help wondering if the benchmark code is falling foul of Go's garbage collection which I understand from discussions here is somewhat less intelligent than Python's GC.


It's actually the JSON parsing. Python is using a highly optimized C parser, whereas Go's json package is in pure Go.


Ah thanks! So nothing to do with GC or poor IO but a poorly selected operation to do in a benchmark.


As it seems like is always the case in these fairly naive benchmarks. (Not specifically about Go here, it seems like people often cite python when they're really using some optimized C code).


>>poorly selected operation to do in a benchmark.

Considering the history of benchmarks, it is quite likely the opposite: The library was selected very carefully, to argue a "point". :-(


PyPy is using a pure-python JSON parser as well.


I am the author is there a better JSON parser that I should have used for Go?


You could use one of the C json implementations such as json-c. However as has been pointed out you will still be just comparing libraries rather than the languages themselves.


I don't think so; I think Golang could use a more well-optimized parser implementation, even if it remains written in Go.


It certainly could be more optimized. For example, all numeric literals are converted to strings before being parsed. That's a mostly unnecessary copy/allocation per number. Similarly, it doesn't make use of the fact that a []T can only legally contain Ts. It's all very clean and generic, which is great, because when working on the compiler they can work on making good code faster without being mislead by code that tries to work around lack of optimizations. As an experiment, I replaced the []int in the Go code in the link with an IntList that implements json.Unmarshaler, and does a single-pass integer list parsing there (without overflow checking, but eh). With that quick 40 line change, the Go code was faster than the CPython, at least on my machine.

That and similar type-specific handling could be built into the json package fairly easily, and would make those cases quite a bit faster. Then again, I bet the Go team would have a better and more general idea.


I spun the JSON decoding off into a few goroutines and managed to get it down to ~1second (but it was out of order, so didn't quite solve the same problem - though this is fixable). This is was ~10 lines of Go (and only using a single core).


This was my bad I have updated the post. I should have been marsheling to a more comparable datatype (map[string]interface{}{}) and now the performance is more in line with what I would expect. I need to look into the Unmarshal code to see why this is so slow.


I appreciate why you use "golang" instead of the general "Go" or current release "Go1", but the former is the name of the open source organization, and the latter is the name of the language.


You are mistaken. There is no organization that maintains the Go project. The project and language are both named Go.

"golang" is just the name of the domain and a more searchable term than "go." There is nothing more to it than that.


Fair. Then what's the name of the implementation? The same as the language? How do I refer to:

* Go, the specification

* gccgo

* 6g and friends: the compilers


Go, gccgo, and gc respectively.


The Go program allocates a new item on every line instead of allocating a single struct and just passing a pointer into unmarshal, which would be way faster. There's some claim of "doing async style programming" but the Go code doesn't parellelize anything, so... not really a fair Go entry there. The input file is also not supplied.

edit: not to mention there are things like adding individual sums multiple times that suggest that the author didn't check the output to verify that all programs arrived at the same output...


I wrote the article because I write a lot of web applications which a vast majority of the request are JSON. I am about to do a lot of post analysis on JSON sent from the web browser so it is valuable for me to know how fast a language (library) JSON parser is. For highly concurrent webserver it is good to know how fast and efficient the JSON parser is since it will be doing a lot of concurrent parsing of JSON.


When you're working with streaming JSON like this (or in a web app), it might be easier to use a decoder/encoder. See http://play.golang.org/p/TLNORK2WK9 for an example.

Also fwiw, it has never been my experience that json decoding/encoding become the bottleneck in a web app. I/O (to a database, or the filesystem) is by far the largest bottleneck in any app I've profiled. A good benchmark for a web app language is hard to write, because it tends to depend on (unreliable) IO-bound systems.


For just a 1.0 language Go is really looking good. Compared to Python at 2.x it's very impressive. I'm looking into using it for my project.


I understand the reasoning behind this benchmark, but the Python results aren't super relevant because the library is written in C.

What is interesting is the Dart/Go comparison. Dart's VM still doesn't have a lot of the optimizations that will ultimately be in there and it's keeping pace pretty well with Go's much more mature compiler (which are based on the Plan 9 compilers).


I think you havd it backwards. The Go compiler has few optimizations, while the V8 project is much more mature.


The PyPy JSON parser is pure python and gets sped up by the JIT.


Can anyone talk about the state of the compiler in Go? How good/bad is the optimizer? That's not something you usually address early in language design.


Go's gc compiler is relatively young and makes few optimizations (the compiled code still performs well - this particular benchmark compares Go's unoptimized json package with a mature C library). The gccgo compiler can take advantage of all gcc's code generation optimizations, and it's performance reflects this.


Would the comparison be more fair if the Python/Dart code were mapping the JSON to an existing type?


How does Ruby compare?


I actually like dart and go more so than javascript. Javascript still has a leg up when it comes to libraries. If Go or Dart get a follow I would be happy to look into it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: