[Python-Dev] Parrot -- should life imitate satire?

Tim Peters tim.one@home.com
Wed, 1 Aug 2001 01:23:59 -0400


[Paul Prescod]
> What is the downside of the global lock on the average single
> processor machine? I tend to think that the "default" threading model
> should allow simple and easy, everything-shared multi-threading on
> ordinary machines.  Having a multi-processor-friendly advanced mode is
> a great extension for the wizards.

[Dan Sugalski]
> If you hold the lock during an I/O operation, you'll lose time you
> could have otherwise used.  Getting and releasing a global lock
> frequently also costs performance you might otherwise have used in other
> places.  Mutex releases require memory coherency, which will force your
> CPU to flush any pending writes that might be hanging about, which will
> tend to drop it's efficiency, especially on heavily out-of-order
> machines like the Alpha.
>
> Also, that is a zillion and a half mutex aquisition and releases, most
> of which you probably have no need of.

Python doesn't actually suffer from either of these problems:  while there's
a pair of acquire/release-global-lock macros around potentially blocking I/O
calls in Python's runtime (ditto sleep(), etc), no mutex is actually
allocated before *somebody* calls PyEval_InitThreads:

void
PyEval_InitThreads(void)
{
	if (interpreter_lock)
		return;
	_PyThread_Started = 1;
	interpreter_lock = PyThread_allocate_lock();
	PyThread_acquire_lock(interpreter_lock, 1);
	main_thread = PyThread_get_thread_ident();
}

In most uses of Python, no thread other than the main thread ever gets
created, that routine never gets called, interpreter_lock remains NULL, and
all the global-lock acquire/release code reduces to a cheap test against
NULL.

However, Python calls the platform's thread-safe libraries regardless, and
*that* can be a huge speed hit.  A minor example is that system malloc() is
more expensive in Microsoft's thread-safe version of libc.  A monster
example is speed of line-at-a-time input:  we only recently discovered that
Python's getc()-in-a-loop was killing us on many platforms because the
platform threadsafe library implementation locked and unlocked the stream
for each character.  Worming around that brought our input speed much closer
to Perl's (up to 50x faster on Tru64 Unix).  It's still slower on most
boxes, though, because we're still threadsafe, but, last I looked, Perl's
line-at-a-time input tricks mucked with stdio structs directly without
benefit of exclusion (and are not threadsafe).

> ...
> (I do work on SMP machines as a rule, so I am a little biased against
> things that single-thread me when I don't need it--what's the point of
> 500% idle time?)

Greg Stein is the fellow to talk with about about "free threading" of
Python.  He had that at least mostly working several years ago, but it was a
major project, that patch is way out of date now, and Python is much more
elegant now.  Oops!  I didn't mean "elegant", I meant "bigger" <wink>.