Threading model overview of Python, Ruby, PERL, et. al.

davidw · on Sept 13, 2008

If I'm not mistaken, it was Tcl that did the 'Perl' thread model first. It does it quite well, too, IMO.

thwarted · on Sept 14, 2008

"The second catch is that every new thread is very expensive to create. The interpreter is not small, and duplicating it with every thread makes for a lot of overhead."

There have been a lot of misconceptions about threading in perl, most likely because of the two different implementations.

The modern implementation, which has been around literally for years, is seemingly pthread based. According to strace, the clone(2) system call is used on linux to create a thread, and all memory is shared between the threads (CLONE_VM is specified). "Duplicating the interpreter" is roughly the same as for any other thread based program, where what needs to be allocated is a stack for the thread and some other TLS areas.

However, the time it takes to start up a thread is often why a "pre-fork" thread model is used. One should take into account the limits of the platform they are using when they are designing the application, to work around/avoid any perceived weaknesses.

ajross · on Sept 13, 2008

Here's a language (mine) in the same space as perl/python/ruby/javascript that does native threads just fine: http://plausible.org/nasal

The problem isn't that hard if you think about it from the start. Nasal will grab a global lock for garbage collection, but everything else is parallel.

fauigerzigerk · on Sept 13, 2008

It seems there are two conflicting trends that affect the use of threads for parallelism: One is that the amount RAM available to a typical machine grows fast and thus in-memory data processing becomes more important. The other trend is that some smart people prefer a process model over threads, which excludes working with a lot of in-memory data or at least makes it hugely more complex (working with shared memory blobs, no pointers, none of the usual data structures, etc).

I for one feel very restricted by languages like Python, Ruby or PHP because they effectively force me to use the process model for parallelism and prevent easy sharing of in-memory data structures between threads. So even though I like these languages I use C#, Java or C++ for most projects. Others may of course have different scenarios and different requirements.