threads or queue for this task
James J. Besemer
jb at cascade-sys.com
Tue Sep 17 11:18:42 EDT 2002
Alex Martelli wrote:
>Taking it for granted that one wouldn't pontificate without vast
>relevant experience,
>
A rather rash assumption for internet communications. ;o)
But I appreciate your giving me benefit of the doubt.
>Perhaps the hint is in Mr Pettersen's mention of
>"processing too many 16-GB files" vs your insistence on "capacity
>to handle the expected traffic", "time-critical requests", "the
>work requests may be created on a different machine", etc.
>
>I.e., you're focusing (perhaps because of how life shaped your
>experience of threading) on a "server" process or machine that
>is just responding to traffic originating elsewhere, while I'm
>giving at least equal weight to a process that's "just" doing a
>lot of batch-like work, e.g. generated by splitting up the task
>of processing a "16-GB file" or the like, and using threading
>to ensure the best throughput (implies that substantial parts of
>processing each work request happen in C code that release the
>GIL, be it within the Python core or in extensions).
>
It does sound like a big part of our disagreement is past experience
with different types of appellations.
The bulk of my past experience almost always involved multiple processes
and often multiple processors and usually highly complex, dedicated
applications. Generally, lost data was never an option, though as I
enumerated some of the systems in my previous note, a few did include
bounded queues.
More specifically to Python, IIRC, whenever I've used threads, the
application always involved input via sockets, ultimately from client
apps, over which the server app had no direct control. In those cases,
I've always used unbounded queues to connect threads and they worked
just fine.
I never wrote a stand-alone Python app like you describe. I agree if
you're writing a self-contained application then it makes sense to used
bounded queues to help throttle the generator thread.
>If this is the root of our disagreement, I suggest you reconsider
>whether "the general case" need ONLY include "networked"
>situations, which seem to be what you istinctively think of
>in this context,
>
In a cursory review of the record, I don't see where I made that
particular assertion.
>or "threading to slice up a big batch of work" too.
>
Seems at every opportunity I agree that your scenarios are valid and
never once said or implied they were not. I think part of my problem
here is I don't have the experience to pick words that exactly match
your expectations.
I think the number of times the following sentence has been referred to
in our discussion may point to the crux of our problem.
>>>>>>Generally, Queues never get "full" and you generally don't care if they're empty.
>>>>>>
>>>>>>
I intended this statement to be a gross simplification in the interest
of directing Robin away from what I regarded to be unnecessary
complexities.
Recall that it was an answer to the specific question of "I am put off
by [...] empty() [...] how can you code if you don't know if the queue
is empty or full?" It was intended as a way for the beginner to begin
to use queues and not worry about unnecessary details.
Out of context the statement is a much easier nit to pick. In context,
it's as much a part of an elaboration on "multithreading semantics" as
anything else. The last sentence is the most important part.
Generally, Queues never get "full" and you generally don't care if
they're empty. The import thing about Queues is that the reader
thread will automatically be suspended when reading from an empty
queue and furthermore it will automatically wake up when the queue
becomes non empty. All you need from Queues is to create them and
to get and put data.
I submit that the oversimplification applies, in a way, to your bounded
queue scenario -- the programmer generally shouldn't "CARE" if the
queues are empty or full -- the important thing is that they block and
unblock when appropriate.
I hardly intended this statement to stand as an accurate
characterization of all queuing systems, which appears to be a big part
of your complaint.
In any case, I hereby withdraw the entire statement altogether and
invite you to propose instead whatever alternative summary you think
best serves the the OP and the Python community generally. In this I
invite you to have the last words on the subject.
>...but setting a Queue to bounded, and implicitly sleeping when
>trying to add one more item to a full Queue, is NOT significantly
>more complicated. It's strongly parallel to sleeping just as
>implicitly when trying to peel an item from an empty Queue. The
>symmetry, when applicable, may be considered more elegant than
>an *asymmetric* solution, after all.
>
Looking at queues in isolation, I agree an upper bound is not a deep
concept.
Whether bounded or unbounded is more "elegant" is a value judgment where
reasonable people may disagree. E.g., your discourse about Rosetti's
Ferrara seemed to imply that asymmetry was more elegant than symmetry.
In your scenario of a self-contained process, a bounded queue is simpler
solution.
In my scenario where incoming traffic may have to be discarded or dealt
with in some out of band fashion a bounded queue is more complicated
than unbounded queues.
>But at least we can agree to
>disagree civilly, whence my apologies for appearing unwilling to
>do so in my earlier post.
>
This is progress, I suppose.
With my latest concessions and stipulations I hope we're no longer all
that far apart.
>>[...] Extreme Programming [..]
>>
>>
>But from my POV this generally goes for *threading in general*!
>
True and I believe I was the only person to mention previously that
multi threading may be overkill.
However, the original request was specifically about threads and without
more detail, "don't use threads" seemed an inappropriate response.
>I always consider a first-cut solution WITHOUT multi-processing, which
>is a huge complication in itself, more often than not.
>
While generally I would agree, I find that many TCP/IP applications are
somewhat intrinsically concurrent. Within that, threads are much
simpler than, say, using Select(). For starters, it's sometimes easier
to have separate reader and writer threads than to rely on incoming in
outgoing data to be perfectly interleaved. A common paradigm is a
listener thread accepts connects from a socket and launches new reader
threads to handle each connection. Then maybe the reader threads queue
some incoming requests to server threads so that the socket never fills
or blocks. In one application, I had two listeners listening on two
separate sockets, each spawning per-client reader threads, each of which
funneled some subset of requests to a common processing thread guarding
a critical resource. On the client side, sometimes it's easier having a
separate thread monitoring return traffic. In another app, I have a
listener thread which spawns multiple connection threads, one for each
external client. Connection threads process commands, some of which are
queued to a "command processor" thread and others of which queued to a
serial port writer thread. A serial port reader thread handles the back
traffic, some of which gets forwarded via selected connection threads
back to interested client processes. Although theoretically possible, I
shudder to think of how difficult that would be to implement using
Select() instead of threads.
In all of these cases, I have encountered no need for limiting the size
of the queues used. But a key ingredient is there is no 'generator'
thread to overflow the queues. Then too, it's also always been a given
that overall service utilization was very low.
>I do disagree, and this may be a philosophical point in part.
>
That would explain the amount of energy being expended by both parties.
The argument is so intense because the stakes are so low. ;o)
>But the ability to make SOME Queue's bounded,
>and still handle them without fuss, IS important in a vast
>enough number of cases that ruling this possibility out is
>a serious didactic mistake, IMHO.
>
I stipulated this much from the beginning and never ruled it out.
>In particular, when preparing work-requests is part of the
>design's task (e.g. by analyzing pieces of a huge existing
>file), a bounded Queue can simply and without problems do
>some appropriate balancing within the system between the
>amount of resources devoted to preparing work requests,
>and the amount devoted to processing them. It's as simple
>as that!
>
You've made this point more than once in this exchange. I've never
disputed it.
>Perhaps, but this is a straight quote from Albert Einstein, of
>course, therefore you should take the matter up with him...
>
The important thing is we both agree it should be as simple as possible.
We simply don't agree what that means, exactly.
Regards
--jb
--
James J. Besemer 503-280-0838 voice
2727 NE Skidmore St. 503-280-0375 fax
Portland, Oregon 97211-6557 mailto:jb at cascade-sys.com
http://cascade-sys.com
More information about the Python-list
mailing list