threads or queue for this task

Tue Sep 17 11:18:42 EDT 2002

Alex Martelli wrote:

>Taking it for granted that one wouldn't pontificate without vast
>relevant experience,
>

A rather rash assumption for internet communications.  ;o)

But I appreciate your giving me benefit of the doubt.

>Perhaps the hint is in Mr Pettersen's mention of
>"processing too many 16-GB files" vs your insistence on "capacity
>to handle the expected traffic", "time-critical requests", "the
>work requests may be created on a different machine", etc.
>
>I.e., you're focusing (perhaps because of how life shaped your
>experience of threading) on a "server" process or machine that
>is just responding to traffic originating elsewhere, while I'm
>giving at least equal weight to a process that's "just" doing a
>lot of batch-like work, e.g. generated by splitting up the task
>of processing a "16-GB file" or the like, and using threading
>to ensure the best throughput (implies that substantial parts of
>processing each work request happen in C code that release the
>GIL, be it within the Python core or in extensions).  
>

It does sound like a big part of our disagreement is past experience 
with different types of appellations. 

The bulk of my past experience almost always involved multiple processes 
and often multiple processors and usually highly complex, dedicated 
applications.  Generally, lost data was never an option, though as I 
enumerated some of the systems in my previous note, a few did include 
bounded queues.

More specifically to Python, IIRC, whenever I've used threads, the 
application always involved input via sockets, ultimately from client 
apps, over which the server app had no direct control.  In those cases, 
I've always used unbounded queues to connect threads and they worked 
just fine.

I never wrote a stand-alone Python app like you describe.  I agree if 
you're writing a self-contained application then it makes sense to used 
bounded queues to help throttle the generator thread.

>If this is the root of our disagreement, I suggest you reconsider
>whether "the general case" need ONLY include "networked"
>situations, which seem to be what you istinctively think of
>in this context, 
>

In a cursory review of the record, I don't see where I made that 
particular assertion.

>or "threading to slice up a big batch of work" too.  
>

Seems at every opportunity I agree that your scenarios are valid and 
never once said or implied they were not.  I think part of my problem 
here is I don't have the experience to pick words that exactly match 
your expectations.  

I think the number of times the following sentence has been referred to 
in our discussion may point to the crux of our problem.

>>>>>>Generally, Queues never get "full" and you generally don't care if they're empty.
>>>>>>            
>>>>>>

I intended this statement to be a gross simplification in the interest 
of directing Robin away from what I regarded to be unnecessary 
complexities.  

Recall that it was an answer to the specific question of  "I am put off 
by [...] empty() [...] how can you code if you don't know if the queue 
is empty or full?"  It was intended as a way for the beginner to begin 
to use queues and not worry about unnecessary details.  

Out of context the statement is a much easier nit to pick.  In context, 
it's as much a part of an elaboration on  "multithreading semantics" as 
anything else.  The last sentence is the most important part.

    Generally, Queues never get "full" and you generally don't care if
    they're empty.  The import thing about Queues is that the reader
    thread will automatically be suspended when reading from an empty
    queue and furthermore it will automatically wake up when the queue
    becomes non empty.  All you need from Queues is to create them and
    to get and put data.

I submit that the oversimplification applies, in a way, to your bounded 
queue scenario -- the programmer generally shouldn't "CARE" if the 
queues are empty or full -- the important thing is that they block and 
unblock when appropriate.

I hardly intended this statement to stand as an accurate 
characterization of all queuing systems, which appears to be a big part 
of your complaint.  

In any case, I hereby withdraw the entire statement altogether and 
invite you to propose instead whatever alternative summary you think 
best serves the the OP and the Python community generally.  In this I 
invite you to have the last words on the subject.

>...but setting a Queue to bounded, and implicitly sleeping when
>trying to add one more item to a full Queue, is NOT significantly
>more complicated.  It's strongly parallel to sleeping just as
>implicitly when trying to peel an item from an empty Queue.  The
>symmetry, when applicable, may be considered more elegant than
>an *asymmetric* solution, after all.
>

Looking at queues in isolation, I agree an upper bound is not a deep 
concept.

Whether bounded or unbounded is more "elegant" is a value judgment where 
reasonable people may disagree.  E.g., your discourse about Rosetti's 
Ferrara seemed to imply that asymmetry was more elegant than symmetry.

In your scenario of a self-contained process, a bounded queue is simpler 
solution.  

In my scenario where incoming traffic may have to be discarded or dealt 
with in some out of band fashion a bounded queue is more complicated 
than unbounded queues.

>But at least we can agree to
>disagree civilly, whence my apologies for appearing unwilling to
>do so in my earlier post.
>

This is progress, I suppose.

With my latest concessions and stipulations I hope we're no longer all 
that far apart.

>>[...] Extreme Programming [..]
>>    
>>
>But from my POV this generally goes for *threading in general*!  
>

True and I believe I was the only person to mention previously that 
multi threading may be overkill.

However, the original request was specifically about threads and without 
more detail, "don't use threads" seemed an inappropriate response.

>I always consider a first-cut solution WITHOUT multi-processing, which
>is a huge complication in itself, more often than not.  
>

While generally I would agree, I find that many TCP/IP applications are 
somewhat  intrinsically concurrent.  Within that, threads are much 
simpler than, say, using Select().  For starters, it's sometimes easier 
to have separate reader and writer threads than to rely on incoming in 
outgoing data to be perfectly interleaved.   A common paradigm is a 
listener thread accepts connects from a socket and launches new reader 
threads to handle each connection.  Then maybe the reader threads queue 
some incoming requests to server threads so that the socket never fills 
or blocks.  In one application, I had two listeners listening on two 
separate sockets, each spawning per-client reader threads, each of which 
funneled some subset of requests to a common processing thread guarding 
a critical resource.  On the client side, sometimes it's easier having a 
separate thread monitoring return traffic.  In another app, I have a 
listener thread which spawns multiple connection threads, one for each 
external client.  Connection threads process commands, some of which are 
queued to a "command processor" thread and others of which queued to a 
serial port writer thread.  A serial port reader thread handles the back 
traffic, some of which gets forwarded via selected connection threads 
back to interested client processes.  Although theoretically possible, I 
shudder to think of how difficult that would be to implement using 
Select() instead of threads.  

In all of these cases, I have encountered no need for limiting the size 
of the queues used.  But a key ingredient is there is no 'generator' 
thread to overflow the queues.  Then too, it's also always been a given 
that overall service utilization was very low.

>I do disagree, and this may be a philosophical point in part.  
>

That would explain the amount of energy being expended by both parties.

The argument is so intense because the stakes are so low.  ;o)

>But the ability to make SOME Queue's bounded,
>and still handle them without fuss, IS important in a vast
>enough number of cases that ruling this possibility out is
>a serious didactic mistake, IMHO.
>

I stipulated this much from the beginning and never ruled it out.

>In particular, when preparing work-requests is part of the
>design's task (e.g. by analyzing pieces of a huge existing
>file), a bounded Queue can simply and without problems do
>some appropriate balancing within the system between the
>amount of resources devoted to preparing work requests,
>and the amount devoted to processing them.  It's as simple
>as that!
>

You've made this point more than once in this exchange.  I've never 
disputed it.

>Perhaps, but this is a straight quote from Albert Einstein, of
>course, therefore you should take the matter up with him...
>

The important thing is we both agree it should be as simple as possible.  

We simply don't agree what that means, exactly.

Regards

--jb

-- 
James J. Besemer		503-280-0838 voice
2727 NE Skidmore St.		503-280-0375 fax
Portland, Oregon 97211-6557	mailto:jb at cascade-sys.com
				http://cascade-sys.com