Fake threads (was [Python-Dev] ActiveState & fork & Perl)

Tim Peters tim_one@email.msn.com
Sun, 4 Jul 1999 04:45:58 -0400


[Guido and Tim, Guido and Tim]

Ouch!  This is getting contentious.  Let's unwind the "you said, I said, you
said" business a bit.

Among the three {coroutines, fake threads, continuations}, I expect any
could be serviceably simulated via either of the others.  There:  just saved
a full page of sentence diagramming <wink>.  All offer a strict superset of
generator semantics.

It follows that, *given* either coroutines or continuations, I indeed see no
semantic hole that would be plugged by fake threads.  But Python doesn't
have any of the three now, and there are two respects in which fake threads
may have an advantage over the other two:

1) Pedagogical, a friendlier sandbox for learning "real threads".

2) Python already has *a* notion of threads.  So fake threads could be seen
as building on that (variation of an existing concept, as opposed to
something unprecedented).

I'm the only one who seems to see merit in #2, so I won't mention it again:
fake threads may be an aid to education, but other than that they're useless
crap, and probably cause stains if not outright disk failure <wink>.


About newbies, I've seen far too many try to learn threads to entertain the
notion that they're easier than I think.  Threads != parallel programming,
though!  Approaches like Gelertner's Linda, or Klappholz's "refined
languages", *are* easy for newbies to pick up, because they provide clear
abstractions that prevent the worst parallelism bugs by offering primitives
that *can't* e.g. deadlock.  threading.py is a step in the right direction
(over the "thread" module) too.  And while I don't know what Alice presents
as a parallelism API, I'd bet 37 cents unseen that the Alice user doesn't
build "parallel activities" out of thread.start_new_thread and raw mutii
<wink>.


About the rest, I think you have a more expansive notion of I/O than I do,
although I can squint and see what you mean; e.g., I *could* view almost all
of what Dragon's products do as I/O, although it's a real stretch for the
thread that just polls the other threads making sure they're still alive
<wink -- part of our text-to-speech system is licensed, and we don't have
the source, and it dies in mysterious ways; so we run it in a thread and
restart it whenever it vanishes -- can't afford to have the whole app die!>.


Back to quoting:

>> Throwing explicit threads at this is like writing a recursive
>> Fibonacci number generator in Scheme, but building the recursion
>> yourself by hand out of explicit continuations <wink>.

> Aren't you contradicting yourself?  You say that threads are
> ubiquitous and easy on Windows (and I agree), yet you claim that
> threads are overkill for doing two kinds of I/O or one kind of I/O and
> some computation in parallel...?

They're a general approach (like continuations) but, yes, given an asynch
I/O interface most times I'd much rather use the latter (like I'd rather use
recursion directly when it's available).  BTW, I didn't say threads were
"easy" under Windows:  cheap, reliable & ubiquitous, yes.  They're easier
than under many other OSes thanks to a rich collection of system-supplied
thread gimmicks that actually work, but no way are they "easy".  Like you
did wrt hiding "thread" under "threading", even under Windows real projects
have to create usable app-specific thread-based abstractions (c.f. your
on-target remark about Netscape & thread bugs).

> I'm also thinking of Java threads.  Yes, the GC thread is one of those
> computational threads you are talking about, but I think the examples
> I've seen otherwise are mostly about having one GUI component (e.g. an
> applet) independent from other GUI components (e.g. the browser).  To
> me that's overlapping I/O, since I count GUI events as I/O.

Whereas I don't.  So let's just agree to argue about this one with
ever-increasing intensity <wink>.

> ...
> What makes them unreal except for the interpreter lock?  Python
> threads are always OS threads, and that makes them real enough for
> most purposes...

We should move this part to the Thread-SIG; Mark & Greg are doubtless
chomping at the bit to rehash the headaches the global lock causes under
Windows <wink>; I'm not so keen either to brush off the potential benefits
of multiprocessor parallelism, particularly not with the price of CPUs
falling into spare-change range.

> (I'm not sure if there are situations on uniprocessors where the
> interpreter lock screws things up that aren't the fault of the
> extension writer -- typically, problems arise when an extension does
> some blocking I/O but doesn't place Py_{BEGIN,END}_ALLOW_THREADS
> macros around the call.)

Hmm!  What kinds of problems happen then?  Just a lack of hoped-for overlap,
or actual deadlock (the I/O thread needing another thread to proceed for
itself to make progress)?  If the latter, the extension writer's view of
who's at fault may differ from ours <wink -- but I agree with you here>.

>> (e.g., my employer ships heavily threaded Windows apps of various
>> kinds, and overlapped I/O isn't a factor in any of them; it's mostly
>> a matter of algorithm factoring to keep the real-time incestuous
>> subsystems from growing impossibly complex, and in some of the very
>> expensive apps also a need to exploit multiple processors).

> Hm, you admit that they sometimes want to use multiple CPUs, which was
> explcitly excluded from our discussion (since fake threads don't help
> there),

I've been ranting about both fake threads and real threads, and don't recall
excluding anything; I do think I *should* have, though <smile>.

> and I bet that they are also watching some kind of I/O (e.g. whether the
> user says some more stuff).

Sure, and whether the phone rings, and whether text-to-speech is in
progress, and tracking the mouse position, and all sorts of other arguably
I/O-like stuff too.

Some of the subsytems are thread-unaware legacy or 3rd-party code, and need
to run in threads dedicated to them because they believe they own the entire
machine (working via callbacks).  The coupling is too tight to afford IPC
mechanisms, though (i.e., running these in a separate process is not an
option).

Mostly it's algorithm-factoring, though:  text-to-speech and speech-to-text
both require mondo complex processing, and the "I/O part" of each is a small
link at an end of a massive chain.

Example:  you say something, and you expect to see "the result" the instant
you stop speaking.  But the CPU cycles required to recognize 10 seconds of
speech consumes, alas, about 10 seconds.  So we *have* to overlap the speech
collection with the signal processing, the acoustic feature extraction, the
acoustic scoring, the comparison with canned acoustics for many tens of
thousands of words, the language modeling ("sounded most like 'Guido', but
considering the context they probably said 'ghee dough'"), and so on.  You
simply can't write all that as a monolothic algorithm and have a hope of it
working; it's most naturally a pipeline, severely complicated in that what
pops out of the end of the first stage can have a profound effect on what
"should have come out" at the start of the last stage.

Anyway, thread-based pseudo-concurreny is a real help in structuring all
that.  It's *necessary* to overlap speech collection (input) with
computation and result-so-far display (output), but it doesn't stop there.

> ...
> Agreed -- I don't understand where green comes from at all.  Does it
> predate Java?

Don't know, but I never heard of it before Java or outside of Solaris.

[about generators & dumb linguists]
> Strange.  Maybe dumb linguists are better at simply copying examples
> without thinking too much about them; personally I had a hard time
> understanding what Icon was doing when I read about it, probably
> because I tried to understand how it was done.  For threads, I have a
> simple mental model.  For coroutines, my head explodes each time.

Yes, I expect the trick for "dumb linguists" is that they don't try to
understand.  They just use it, and it works or it doesn't.  BTW, coroutines
are harder to understand because of (paradoxically!) the symmetry;
generators are slaves, so you don't have to bifurcate your brain to follow
what they're doing <wink>.

>>     print len(text), url
>>
>> may print the len(text) from one thread followed by the url
>> from another).

> Fine -- that's a great excuse to introduce locks in the next section.
> (Most threading tutorials I've seen start by showing flawed examples
> to create an appreciation for the need of locks.)

Even better, they start with an endless sequence of flawed examples that
makes the reader wonder if there's *any* way to get this stuff to work
<wink>.

>>     for x in backwards(sequence):
>>         print x
>>
>>     def backwards(s):
>>         for i in xrange(len(s)-1, -1, -1):
>>             suspend s[i]

> But backwards() also returns, when it's done.  What happens with the
> return value?

I don't think a newbie would think to ask that:  it would "just work"
<wink>.  Seriously, in Icon people quickly pick up that generators have a
"natural lifetime", and when they return their life is over.  It hangs
together nicely enough that people don't have to think about it.

Anyway, "return" and "suspend" both return a value; the only difference is
that "return" kills the generator (it can't be resumed again after a
return).  The pseudo-Python above assumed that a generator signals the end
of its life by returning None.  Icon uses a different mechanism.

> ...
> Probably right, although I think that os.path.walk just has a bad API
> (since it gives you a whole directory at a time instead of giving you
> each file).

Well, in Ping's absence I've generally fielded the c.l.py questions about
tokenize.py too, and there's a pattern:  non-GUI people simply seem to find
callbacks confusing!  os.path.walk has some other UI glitches (like "arg" is
the 3rd argument to walk but the 1st arg to the callback, & people don't
know what its purpose is anyway), but I think the callback is the core of it
(& "arg" is an artifact of the callback interface).  I can't help but opine
that part of what people find so confusing about call/cc in Scheme is that
it calls a function taking a callback argument too.

Generators aren't strong enough to replace call/cc, but they're exactly
what's needed to make tokenize's interface match the obvious mental model
("the program is a stream of tokens, and I want to iterate over that"); c.f.
Sam's comments too about layers of callbacks vs "normal control flow".

>> 3) Python's current threads are good for overlapping I/O.
>>    Sometimes.  And better addressed by Sam's non-threaded "select"
>>    approach when you're dead serious about overlapping lots of I/O.

> This is independent of Python, and is (I think) fairly common
> knowledge -- if you have 10 threads this works fine, but with 100s of
> them the threads themselves become expensive resources.

I think people with a Unix background understand that, but not sure about
Windows natives.  Windows threads really are cheap, which easily slides into
abuse; e.g., the recently-fixed electron-width hole in cleaning up thread
states required extreme rates of thread death to provoke, and has been
reported by multiple Windows users.  An SGI guy was kind enough to confirm
the test case died for him too, but did any non-Windows person ever report
this bug?

> But then you end up with contorted code which is why high-performance
> systems require experts to write them.

Which feeds back into Sam's agenda:  the "advanced" control-flow gimmicks
can be used by an expert to implement a high-performance system that doesn't
require expertise to use.  Fake threads would be good enough for that
purpose too (while real threads aren't), although he's got his heart set on
one of the others.

>> I don't know, Guido -- if all you wanted threads for was to speed up a
>> little I/O in as convoluted a way as possible, you may have been witness
>> to the invention of the wheel but missed that ox carts weren't the last
>> application <wink>.

> What were those applications of threads again you were talking about
> that could be serviced by fake threads that weren't coroutines/generators?

First, let me apologize for the rhetorical excess there -- it went too far.
Forgive me, or endure more of the same <wink>.

Second, the answer is (of course) "none", but that was a rant about real
threads, not fake ones.

so-close-you-can-barely-tell-'em-apart-ly y'rs  - tim