
[Guido]
I guess it's all in the perspective. 99.99% of all thread apps I've ever written use threads primarily to overlap I/O -- if there wasn't I/O to overlap I wouldn't use a thread. I think I share this perspective with most of the thread community (after all, threads originate in the OS world where they were invented as a replacement for I/O completion routines).
[Tim]
Different perspective indeed! Where I've been, you never used something as delicate as a thread to overlap I/O, you instead used the kernel-supported asynch Fortran I/O extensions <0.7 wink>.
Those days are long gone, and I've adjusted to that. Time for you to leave the past too <wink>: by sheer numbers, most of the "thread community" *today* is to be found typing at a Windows box, where cheap & reliable threads are a core part of the programming culture.
No quibble so far...
They have better ways to overlap I/O there too.
Really? What are they? The non-threaded primitives for overlapping I/O look like Medusa to me: very high performance, but a pain to use -- because of the event-driven programming model! (Or worse, callback functions.) But maybe there are programming techniques that I'm not even aware of? (Maybe I should define what I mean by overlapping I/O -- basically every situation where disk or network I/O or GUI event handling goes on in parallel with computation or with each other. For example, in my book copying a set of files while at the same time displaying some silly animation of sheets of paper flying through the air *and* watching a Cancel button counts as overlapping I/O, and if I had to code this it would probably be a lot simpler to do using threads.
Throwing explicit threads at this is like writing a recursive Fibonacci number generator in Scheme, but building the recursion yourself by hand out of explicit continuations <wink>.
Aren't you contradicting yourself? You say that threads are ubiquitous and easy on Windows (and I agree), yet you claim that threads are overkill for doing two kinds of I/O or one kind of I/O and some computation in parallel...? I'm also thinking of Java threads. Yes, the GC thread is one of those computational threads you are talking about, but I think the examples I've seen otherwise are mostly about having one GUI component (e.g. an applet) independent from other GUI components (e.g. the browser). To me that's overlapping I/O, since I count GUI events as I/O.
... As far as I can tell, all the examples you give are easily done using coroutines. Can we call whatever you're asking for coroutines instead of fake threads?
I have multiple agendas, of course. What I personally want for my own work is no more than Icon's generators, formally "semi coroutines", and easily implemented in the interpreter (although not the language) as it exists today.
Coroutines, fake threads and continuations are much stronger than generators, and I expect you can fake any of the first three given either of the others.
Coroutines, fake threads and continuations? Can you really fake continuations given generators?
Generators fall out of any of them too (*you* implemented generators once using Python threads, and I implemented general coroutines -- "fake threads" are good enough for either of those).
Hm. Maybe I'm missing something. Why didn't you simply say "you can fake each of the others given any of these"?
So, yes, for that agenda any means of suspending/resuming control flow can be made to work. I seized on fake threads because Python already has a notion of threads.
A second agenda is that Python could be a lovely language for *learning* thread programming; the threading module helps, but fake threads could likely help more by e.g. detecting deadlocks (and pointing them out) instead of leaving a thread newbie staring at a hung system without a clue.
Yes.
A third agenda is related to Mark & Greg's, making Python's threads "real threads" under Windows. The fake thread agenda doesn't tie into that, except to confuse things even more if you take either agenda seriously <0.5 frown>.
What makes them unreal except for the interpreter lock? Python threads are always OS threads, and that makes them real enough for most purposes... (I'm not sure if there are situations on uniprocessors where the interpreter lock screws things up that aren't the fault of the extension writer -- typically, problems arise when an extension does some blocking I/O but doesn't place Py_{BEGIN,END}_ALLOW_THREADS macros around the call.)
I think that when you mention threads, green or otherwise colored, most people who are at all familiar with the concept will assume they provide I/O overlapping, except perhaps when they grew up in the parallel machine world.
They didn't suggest I/O to me at all, but I grew up in the disqualified world <wink>; doubt they would to a Windows programmer either (e.g., my employer ships heavily threaded Windows apps of various kinds, and overlapped I/O isn't a factor in any of them; it's mostly a matter of algorithm factoring to keep the real-time incestuous subsystems from growing impossibly complex, and in some of the very expensive apps also a need to exploit multiple processors).
Hm, you admit that they sometimes want to use multiple CPUs, which was explcitly excluded from our discussion (since fake threads don't help there), and I bet that they are also watching some kind of I/O (e.g. whether the user says some more stuff).
BTW, I called them "fake" threads to get away from whatever historical baggage comes attached to "green".
Agreed -- I don't understand where green comes from at all. Does it predate Java?
Certainly all examples I give in my never-completed thread tutorial (still available at http://www.python.org/doc/essays/threads.html) use I/O as the primary motivator --
The preceding "99.99% of all thread apps I've ever written use threads primarily to overlap I/O" may explain this <wink>. BTW, there is only one example there, which rather dilutes the strength of the rhetorical "all" ...
OK, ok. I was planning on more along the same lines. I may have borrowed this idea from a Java book I read.
this kind of example appeals to simples souls (e.g. downloading more than one file in parallel, which they probably have already seen in action in their web browser), as opposed to generators or pipelines or coroutines (for which you need to have some programming theory background to appreciate the powerful abstraction possibillities they give).
I don't at all object to using I/O as a motivator, but the latter point is off base. There is *nothing* in Comp Sci harder to master than thread programming! It's the pinnacle of perplexity, the depth of despair, the king of confusion (stop before I exaggerate <wink>).
I dunno, but we're probably both pretty poor predictors for what beginning programmers find hard. Randy Pausch (of www.alice.org) visited us this week; he points out that we experienced programmers are very bad at gauging what newbies find hard, because we've been trained "too much". He makes the point very eloquently. He also points out that in Alice, users have no problem at all with parallel activities (e.g. the bunny's head rotating while it is also hopping around, etc.).
Generators in particular get re-invented often as a much simpler approach to suspending a subroutine's control flow; indeed, Icon's primary audience is still among the humanities, and even dumb linguists <wink> don't seem to have notable problems picking it up. Threads have all the complexities of the other guys, plus races, deadlocks, starvation, load imbalance, non-determinism and non-reproducibility.
Strange. Maybe dumb linguists are better at simply copying examples without thinking too much about them; personally I had a hard time understanding what Icon was doing when I read about it, probably because I tried to understand how it was done. For threads, I have a simple mental model. For coroutines, my head explodes each time.
Threads simply aren't simple-soul material, no matter how pedestrian a motivating *example* may be. I suspect that's why your tutorial remains unfinished: you had no trouble describing the problem to be solved, but got bogged down in mushrooming complications describing how to use threads to solve it.
No, I simply realized that I had to finish the threading module and release the thread-safe version of urllib.py before I could release the tutorial; and then I was distracted and never got back to it.
Even so, the simple example at the end is already flawed ("print" isn't atomic in Python, so the
print len(text), url
may print the len(text) from one thread followed by the url from another).
Fine -- that's a great excuse to introduce locks in the next section. (Most threading tutorials I've seen start by showing flawed examples to create an appreciation for the need of locks.)
It's not hard to find simple-soul examples for generators either (coroutines & continuations *are* hard to motivate!), especially since Python's for/__getitem__ protocol is already a weak form of generator, and xrange *is* a full-blown generator; e.g., a common question on c.l.py is how to iterate over a sequence backwards:
for x in backwards(sequence): print x
def backwards(s): for i in xrange(len(s)-1, -1, -1): suspend s[i]
But backwards() also returns, when it's done. What happens with the return value?
Nobody needs a comp sci background to understand what that *does*, or why it's handy. Try iterating over a tree structure instead & then the *power* becomes apparent; this isn't comp-sci-ish either, unless we adopt a "if they've heard of trees, they're impractical dreamers" stance <wink>. BTW, iterating over a tree is what os.path.walk does, and a frequent source of newbie confusion (they understand directory trees, they don't grasp the callback-based interface; generating (dirname, names) pairs instead would match their mental model at once). *This* is the stuff for simple souls!
Probably right, although I think that os.path.walk just has a bad API (since it gives you a whole directory at a time instead of giving you each file).
Another good use of threads (suggested by Sam) is for GUI programming. An old GUI system, News by David Rosenthal at Sun, used threads programmed in PostScript -- very elegant (and it failed for other reasons -- if only he had used Python instead :-).
On the other hand, having written lots of GUI code using Tkinter, the event-driven version doesn't feel so bad to me. Threads would be nice when doing things like rubberbanding, but I generally agree with Ousterhout's premise that event-based GUI programming is more reliable than thread-based. Every time your Netscape freezes you can bet there's a threading bug somewhere in the code.
I don't use Netscape, but I can assure you the same is true of Internet Explorer -- except there the threading bug is now somewhere in the OS <0.5 wink>.
Anyway,
1) There are lots of goods uses for threads, and especially in the Windows and (maybe) multiprocessor NumPy worlds. Those would really be happier with "free-threaded threads", though.
2) Apart from pedagogical purposes, there probably isn't a use for my "fake threads" that couldn't be done easier & better via a more direct (coroutine, continuation) approach; and if I had fake threads, the first thing I'd do for me is rewrite the generator and coroutine packages to use them. So, yes: you win <wink>.
3) Python's current threads are good for overlapping I/O. Sometimes. And better addressed by Sam's non-threaded "select" approach when you're dead serious about overlapping lots of I/O.
This is independent of Python, and is (I think) fairly common knowledge -- if you have 10 threads this works fine, but with 100s of them the threads themselves become expensive resources. But then you end up with contorted code which is why high-performance systems require experts to write them.
They're also beaten into service under Windows, but not without cries of anguish from Greg and Mark.
Not sure what you mean here.
I don't know, Guido -- if all you wanted threads for was to speed up a little I/O in as convoluted a way as possible, you may have been witness to the invention of the wheel but missed that ox carts weren't the last application <wink>.
What were those applications of threads again you were talking about that could be serviced by fake threads that weren't coroutines/generators? --Guido van Rossum (home page: http://www.python.org/~guido/)