[Python-Dev] Free threading

Fri, 17 Aug 2001 13:30:56 -0700

On Wed, Aug 15, 2001 at 06:59:47PM -0400, Dan Sugalski wrote:
> At 11:50 PM 8/13/2001 -0700, Paul Prescod wrote:
> >Tim Peters wrote:
> > >...
> > > IIRC, Greg's fabled free-threading version of Python took a speed hit of
> > > about a factor of 2 (for a program using only 1 thread, compared to that
> > > same program without the free-threading patches).

Yah. That's about right. The largest problem was dealing with dictionaries.
Since you never knew whether a specific dictionary was shared (between
threads) or not, you always had to lock the access. And since all namespaces
use dictionaries...

> >The Perl guys considered this unacceptable and I can kind of see their
> >point. You have two processors but you get roughly the same performance
> >as one?

No. For definitional purposes, let's say that for normal Python on one
processor, you get 1 Python Speed Unit (PSU). With my free-threading
patches, that uniprocessor would get around 0.6 PSU.

Move to a multiprocessor with 2 CPUs, running a 2-thread program with no
synchronization (e.g. each is simply doing their thing rather than
potentially intefering with each other). Regular Python would get about 0.95
PSU because the GIL imposes some overhead (and you can't ever get more than
1 because of the GIL). With the free-threading, you would get about 1.2 PSU.

On a three processor system, regular Python still gets 0.95 PSU. The
free-threading goes up to maybe 1.6 PSU.

We observed non-linear scaling with the processors under free threading. 2
processors was fine, but 3 or 4 didn't buy you much more than 2. The problem
was lock contention. With that many things going, the contention around
Python's internal structures simply killed further scaling performance.

>...
> I racked up a whole list of "Things to Not Do With Threads" when hacking 
> the original perl thread model. (The first of which is "wedge them into an 
> interpreter that wasn't written with threads in mind..." :) Battle scars 
> are viewable on request.

I hear ya. Same here. But Python has since become ever worse re: free
threading capability (more globals to arbitrate access to).

Last time, I tried to optimize the memory associate with each list/dict.
That slowed some things down. Atomic incr/decr wasn't really available under
Linux (Win has InterlockedIncrement and friends), so the Linux incr/decr was
a bit slower than it should it have been.

There are a number of things that I'd do differently the next time around.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/