threads or queue for this task

Tue Sep 17 08:27:46 EDT 2002

James J. Besemer wrote:

> I also sense that I've somehow done something to piss you off.  I'm
> sorry if I did but I can't think what it would have been.

I apologize in turn for giving that unintended impression.  My
reaction to your previous post can be well summarized with the words 
of Bjorn Pettersen's post (sent 3 minutes before mine, but I had
not seen it as I composed mine): "You must be kidding (or you can't 
have written any significant amount of multithreaded code)".

Taking it for granted that one wouldn't pontificate without vast
relevant experience, I further surmised the "You must be kidding"
part had to be correct -- e.g., that having posted an over-general
assertion (something that happens to all) you were now bloody-
mindedly defending the indefensible.  Now it appears that you're
substantially convinced of that assertion -- even though it appears
so patently false as to be indefensible to both me and Mr Pettersen --
and am left to wonder about how our experiences can be so vastly
different.  Perhaps the hint is in Mr Pettersen's mention of
"processing too many 16-GB files" vs your insistence on "capacity
to handle the expected traffic", "time-critical requests", "the
work requests may be created on a different machine", etc.

I.e., you're focusing (perhaps because of how life shaped your
experience of threading) on a "server" process or machine that
is just responding to traffic originating elsewhere, while I'm
giving at least equal weight to a process that's "just" doing a
lot of batch-like work, e.g. generated by splitting up the task
of processing a "16-GB file" or the like, and using threading
to ensure the best throughput (implies that substantial parts of
processing each work request happen in C code that release the
GIL, be it within the Python core or in extensions).  Thus, work
requests can typically be generated quite fast (just a matter
e.g. of sucking in some more of the "16-GB file" and finding
the right boundaries therein for each work-unit to request), it
IS crucial to consider the ratios of WU-preparation to
WU-processing times, the total number of requests (say 1G
requests if the average work-request deals with about 16
bytes of the 16-GB file...:-), and other such issues that you
dismiss from "the general case".

In such cases, the number of work requests is unbounded when
generating a work-request is a matter of "rolling dice" a
few times.  For example, generating a work request might mean
dealing and selecting a "suitable" bridge deal, and processing
the request might mean simulating double-dummy play over such
a deal and posting the result.  The number of possibilities
is NOT truly unlimited in such cases (there are, after all,
just 5.3E28 possible bridge deals:-) but it's generally simplest
to consider it unlimited, as it's large enough;-).

If this is the root of our disagreement, I suggest you reconsider
whether "the general case" need ONLY include "networked"
situations, which seem to be what you istinctively think of
in this context, or "threading to slice up a big batch of
work" too.  I think it definitely must cover both scenarios.
In this case, we DO want to include bounded queues as a part
of the "general case" -- and I think we should.  "Optional
embellisment": if e.g. the total number of requests won't
cause the machine to trash, bounding the queues doesn't matter.
But it's such a simple thing to do, that it can and should
be considered in most such cases.

>>>>>Generally, Queues never get "full" and you generally don't care if
        ...
> I took your point to be that a bounded queue also is sometimes useful,
> which I do not and did not dispute.

Right: therefore, your here-re-quoted assertion is false.  Queues
DO get full when you set them to be bounded, and this need not
cause substantial complication.

> It goes without saying that if the designer knows in advance that he
> doesn't have the throughput to handle incoming requests, then a

...if requests are "incoming" independently from what the
designer can do, which is not the case when the process is
dealing with a big batch of predefined work, e.g. processing
a 16-GB file...

> significantly more complicated solution is called for.  I alluded to

...but setting a Queue to bounded, and implicitly sleeping when
trying to add one more item to a full Queue, is NOT significantly
more complicated.  It's strongly parallel to sleeping just as
implicitly when trying to peel an item from an empty Queue.  The
symmetry, when applicable, may be considered more elegant than
an *asymmetric* solution, after all.

> So at a high level I am at a loss, as you seem to be picking a fight I
> did not intend to participate in.  Like I said, I thought we were
> largely in agreement, while expanding upon different scenarios.

Again I apologize for giving the impression of "picking a fight".
We can't be "largely in agreement" if you insist that your assertion:
        Generally, Queues never get "full"
was correct and helpful to the OP, while I keep believing it was
a dangerous over-simplification.  But at least we can agree to
disagree civilly, whence my apologies for appearing unwilling to
do so in my earlier post.

> In this, I am "coming from" the standpoint of Extreme Programming -- you
> may recall -- that dictates you start by writing the simplest solution
> you don't know won't work.  Performance optimization and other
> "embellishments" are always deferred unless/until you can't avoid it.

But from my POV this generally goes for *threading in general*!  I
always consider a first-cut solution WITHOUT multi-processing, which
is a huge complication in itself, more often than not.  Therefore
when I consider how to refactor an architecture for processing a
batch of work into a threaded one I'm *ALREADY* dealing with
performance optimization -- if my performance was good enough
without threading, I wouldn't be threading, I'd be using some
simpler approach (generally event-driven, if I must appear to
be doing several things at once, e.g with GUIs, networks, etc).

> That simplest, initial solution IMHO, is an unbounded queue.  I further
> characterize it as "elegant" as it's fully functional for many
> applications and it's as simple as it gets.    To me, real world
> complications, however realistic, tend to result in less "elegant"
> solutions but perhaps you disagree.

I do disagree, and this may be a philosophical point in part.  E.g.,
some Renaissance architects dreamed of "ideal cities" and went so
far as to design and even build some.  MY idea of the most elegant
city-plan produced during the Renaissance is Ferrara's, where the
architect (Rossetti) brilliantly *integrated* real-world complications
(there's a river here, a marsh there, a hill yonder, a major
thoroughfare from X to Y, ...) rather than brushing them aside or
trying to ignore them.  Indeed, Ferrara is the only "designed" (as
opposed to "organically grown") city I know where I wouldn't mind
living.  Of course, not everybody can have Rossetti's genius, but
to my mind THAT is the goal to aim for.  And in fact, in their
everyday professional practice, even Alberti and even more
strikingly Palladio *integrated* "real-world complications" to
produce their most elegant work, whatever they were saying in
their theoretical treatises (I'll never forgive them for placing
"venustas" BEFORE "utilitas", thus betraying Vitruvius' real
heritage and helping start the split between architecture and
engineering that broke the single concept of "techne" into
artificial contrasts between "art" and "technique"... but when
I consider what they actually BUILT rather than what they WROTE,
I see great engineers at work, not "would-be-pure artists"
frustrated in an abstract quest for elegance by the down-and-
dirty "real-world complications"...!).

But that's by the by.  In practice, the amount of "complication"
in setting a Python Queue instance to be bounded is trifling,
anyway.

> Yes.  If you make them too small they will overflow.

Nope!  Rather, the thread that is trying to put one more
item on an already-full Queue will implicitly and simply
sleep until a slot becomes free, exactly the same way as
one trying to pull one more ifem off an already-empty
Queue will implicitly and simply sleep until an item is
placed there.  No intrinsic complications here.

> Bounded queues are A solution to some problems but not The solution for
> all situations.

Of course not!  Never did I assert nor imply that you should
make ALL of your Queue instances bounded -- that would be
silly indeed.  But the ability to make SOME Queue's bounded,
and still handle them without fuss, IS important in a vast
enough number of cases that ruling this possibility out is
a serious didactic mistake, IMHO.

> And as I said, they push part of the problem to some other place in the
> design, which may or may not be a good thing.

In particular, when preparing work-requests is part of the
design's task (e.g. by analyzing pieces of a huge existing
file), a bounded Queue can simply and without problems do
some appropriate balancing within the system between the
amount of resources devoted to preparing work requests,
and the amount devoted to processing them.  It's as simple
as that!

>>bad way to teach: everything must be kept as simple as possible,
>>*but no simpler than that*.
> 
> Is it my turn to retort with a Monsieur de La Palice allusion?

Perhaps, but this is a straight quote from Albert Einstein, of
course, therefore you should take the matter up with him...

Alex