This is not specific to just Python, but I have often wondered if it would be a ...

gizmo · on Feb 19, 2017

The short answer is that performance is as much as property of the language itself as of the implementation. You can make a subset of Python really fast, but then you can't really call it Python anymore. Think of performance in terms of how much work has to be done for a function call, for instance. If the language specifies that all function calls are dynamic, all function arguments can be anything, all type classes can be redefined and so on then those are the constraints that will hold you back when trying to make it run fast. Static constraints don't really help much, because even if you statically assert you're always passing a Person instance to a function, the class definition of Person can still change over time, and the instance itself can swap its parent classes, properties, methods, and so forth.

I agree that making simple and dynamic languages with hugely complicated JIT-compilers is pretty absurd. We need new and fundamentally better languages, that combine performance, expressiveness, and much better compile-time guarantees.

pg314 · on Feb 19, 2017

If you haven't already, you might be interested in 'The Art of the Metaobject Protocol' [1]. It takes you step by step through the design of a highly dynamic, yet still efficient design of an object system for Common Lisp and its metaobject protocol.

The key is designing the extension points and the API in a way that allows efficient implementation. In Python that never happened. It seems like they basically documented the implementation and said: 'here are the dicts containing the internals, feel free to do anything you want'. That's an approach that will never result in an efficient implementation without huge efforts.

[1] https://mitpress.mit.edu/books/art-metaobject-protocol

gizmo · on Feb 19, 2017

Thank you -- I'll check it out.

I'm familiar with CLOS, but I have never looked seriously into implementation strategies. I have read many research papers on compiler design, language design, (multiple)dispatch implementations, and I have concluded that the languages we use day-to-day are decades behind what we would be using if we took advantage of this research.

mjn · on Feb 19, 2017

This is roughly what SBCL does for Lisp, compilation to native code that can sometimes be improved with optional type annotations. Coincidentally, the native-code compiler itself is named Python (it predates Python the language).

dagw · on Feb 19, 2017

I have often wondered if it would be a productive exercise to write a static compiler for a dynamic language which also supported optional static typing.

Isn't this basically exactly what cython is?

munificent · on Feb 19, 2017

I'm on the Dart team, so this is a topic that's close to my heart.

> Presumably this compiler could generate code that is faster than an interpreter like CPython,

People have been writing static compilers for dynamic languages for ages. Lisp and Scheme have been compiled for a long time.

My understanding is even simple static compilation works tolerably well for something like Scheme, but object-oriented languages like Python are a different story.

In object-oriented languages, the majority of "calls" are potentially-polymorphic method invocations. In Python, even `a + b` may call some user-defined "+" method which must be looked up at runtime.

If you statically compile that in a straightforward way, almost all of the resulting "static" code is just going to be calls to dynamic method lookup functions. Something like:

    Obj* temp1 = lookup("a");
    Obj* temp2 = lookup("b");
    Obj* temp3 = invoke(temp1, temp2, "+");

You won't really do anything useful at the level of native code, so you don't gain much from eliminating the interpreter's bytecode loop. In fact, you'll likely lose. That above chunk of C is a pretty verbose encoding of `a + b`.

You can look at bytecode is a compression format for dynamic code of that form. The runtime cost of "decompressing it" -- the bytecode interpret-and-decode loop may be a net win over the memory wasted by the larger native code size and the cache misses and stuff caused by that.

In order to statically compile a very highly dynamic language, you really do need to do some static analysis to eliminate some of the dynamism. That analysis is either very costly (slow, giant monolithic whole program compiles that take ages as in MLton) or doesn't work very well (failing to infer good static types and resorting back to lots of slow dynamic code), or (most often) both.

With Dart, we have a very mature whole program compiler that does concrete type inference, and it's still both too slow to run and not effective enough at inferring good types.

> However, once you add static type notations, the static compiler can (for some cases) generate code that's actually faster than a JIT.

You can do that, yes, but it requires a few things:

1. Your core libraries and idiomatic user code need to actually be fairly "static" in terms of the types they use at runtime. If the code really does use a lot of polymorphism and duck typing, all static analysis can do is tell you, "Yup. You gotta do this at runtime."

Python in particular is well-known for relying heavily on duck typing in its core libraries.

2. Your type system needs to be sound. Once you are within the bounds of static types, the compiler needs to be able to trust that those types are correct, with no soundness holes. Otherwise, it all falls apart.

3. You need to deal with the boundary between dynamic and typed code. Because of 1, you need to do validation when an object flows from dynamic-land into the typed area. That validation occurs at runtime and has its own cost. If you aren't careful, that runtime cost outweighs the perf benefits you get once you are in static code.

The "gradual typing" folks have been beating on this for years and it turns out to be really hard. Google "is gradual typing dead" to see some of the latest soul-searching.

With Dart, we had 1 since the language has always had an optional static type so typical Dart code is fairly Java-esque. We did not have 2 and that turned out to be a big problem. We have since come up with a new sound type system [1] to address that.

3 is hard, but our strategy has been to be more restrictive on what kinds of dynamic code we allow to flow into typed code. For example, you simply can't pass a List<dynamic> into a function that expects a List<int>.

By the time you do all of these things, I think you discover that you don't really have much dynamism left in your system. It is handy having a dynamic type for some places, like deserialization, JSON, etc. But if you care about static compilation and perf, actually being statically typed is just so much easier, and dynamic typing doesn't really offer that much value as an alternative.

[1]: http://news.dartlang.org/2017/01/sound-dart-and-strong-mode....

pg314 · on Feb 19, 2017

> In order to statically compile a very highly dynamic language, you really do need to do some static analysis to eliminate some of the dynamism.

Or you can require the user to add type annotations, if/when he wants fast code.

In SBCL (a compiler for Common Lisp):

  (defun count-even (array)
    (declare (optimize speed (safety 0)))   ; <- these are the optimisation settings for this function
    (declare ((simple-array fixnum) array)) ; <- the argument to the function is an array of fixed sized integers
    (loop for x across array count (evenp x)))

This compiles down to quite efficient assembly:

   > (disassemble 'count-even)
   ; disassembly for COUNT-EVEN
   ; Size: 65 bytes
   ; 05A7ABC2:       BB17001020       MOV EBX, 537919511            ; no-arg-parsing entry point
   ;      BC7:       498BF8           MOV RDI, R8
   ;      BCA:       31C9             XOR ECX, ECX
   ;      BCC:       31F6             XOR ESI, ESI
   ;      BCE:       31D2             XOR EDX, EDX
   ;      BD0:       498B70F9         MOV RSI, [R8-7]
   ;      BD4:       660F1F840000000000 NOP
   ;      BDD:       0F1F00           NOP
   ;      BE0: L0:   4839F1           CMP RCX, RSI
   ;      BE3:       7D18             JNL L1
   ;      BE5:       488B5C8F01       MOV RBX, [RDI+RCX*4+1]
   ;      BEA:       4883C102         ADD RCX, 2
   ;      BEE:       4883E302         AND RBX, 2
   ;      BF2:       4885DB           TEST RBX, RBX
   ;      BF5:       75E9             JNE L0
   ;      BF7:       4883C202         ADD RDX, 2
   ;      BFB:       EBE3             JMP L0
   ;      BFD: L1:   488BE5           MOV RSP, RBP
   ;      C00:       F8               CLC
   ;      C01:       5D               POP RBP
   ;      C02:       C3               RET

If you don't add the type annotation (the second declare form), the compiler will tell you that it can't optimise the code because of type uncertainty. Optimising a program becomes a dialogue between the user and the compiler.

int_19h · on Feb 19, 2017

The entire second half of the comment you've replied to addresses this approach at great length.

kevin_thibedeau · on Feb 19, 2017

Cython gets a modest speedup on plain Python code merely from bypassing the bytecode stack machine.

Certhas · on Feb 19, 2017

What do you think of something like Julia?

munificent · on Feb 21, 2017

I think Julia is a really cool language.

As far as I know, they don't compile ahead of time. They JIT based on the types that are seen at runtime, and from what I've heard, they generate a lot of code for various specializations.

That's a smart trade-off for a mathematical language designed for running on a desktop, but probably not super relevant here.

contravariant · on Feb 19, 2017

You mean something like Julia?

hprotagonist · on Feb 19, 2017

isn't that what Typescript is shooting for?

tps5 · on Feb 19, 2017

As far as I know there are no plans to directly compile (JIT or otherwise) typescript. Typescript is always (again, as far as I know) transpiled into javascript whereupon it gets JIT'd in the browser just like "normal" javascript.

The gains that come from using typescript are productivity gains: in big projects static typing, generics, decorators, and interfaces are supposed to be a boost to productivity, not a boost to performance.

pjmlp · on Feb 20, 2017

I guess that might change in a couple of years if Typescript is still around and WebAssembly is widely available.

munificent · on Feb 19, 2017

No. TypeScript's type system is intentionally unsound (it makes it easier to interop with JS and untyped code), which means they can't rely on types for efficiency.

tbrowbdidnso · on Feb 19, 2017

It was my understanding that most/all of this unsoundness can be removed by telling the compiler to disallow a bunch of stuff using command line args, is this still not the case?

Also even if you can't statically determine the types of some variables typescript seems to be able to determine it for most. If most of the objects are statically known would it give you a large part of the speed up of a totally static system? In Java and C# there's still some intentional holes in the type systems like the dynamic keyword or casting to object

munificent · on Feb 21, 2017

> It was my understanding that most/all of this unsoundness can be removed by telling the compiler to disallow a bunch of stuff using command line args, is this still not the case?

Nope, it's never been the case, even with all the flags on. It's not about implicit casts, it's about soundness in the actual type rules. Here's an example:

    function append(array: Array<Object>) {
        array.push("not a number");
    }

    var nums: Array<number> = [];
    append(nums);
    console.log(nums[0].toFixed());

If you run this, it throws an exception because String does not have a "toFixed()" method.

> If most of the objects are statically known would it give you a large part of the speed up of a totally static system?

Maybe, but it's hard to say. Once you have a single hole, if you can't check it at runtime right at the hole, then values of the wrong type can spread anywhere in the program.

> In Java and C# there's still some intentional holes in the type systems like the dynamic keyword or casting to object

Right. In the above example, Java and C# would throw an exception when you added the string to the array. This check has some runtime cost, but works.

I believe TypeScript is also unsound around function types, which is harder to address because there's no easy place to know where to add the check.

the_duke · on Feb 19, 2017

All of this is not really relevant for TypeScript, as it just transpiles to Javascript and ends up running on the used Javascript VM.

Sure, sometimes you might be able to generate more efficient code by tailoring the emitted Javascript to the VM behaviour or using something like typed arrays for low level operations, but if the target language doesn't have support for certain things like type hinting, there is not much that could be gained.