Multiprocessing.Queue - I want to end.
Dave Angel
davea at ieee.org
Sat May 2 16:46:16 EDT 2009
Hendrik van Rooyen wrote:
> : "Roel Schroeven" <rschroev_nospam_ml at fastmail.fm> wrote:
>
>
>
>> Hendrik van Rooyen schreef:
>>
>>> I have always wondered why people do the one queue many getters thing.
>>>
>> Because IMO it's the simplest and most elegant solution.
>>
>
> That is fair enough...
>
>
>>> Given that the stuff you pass is homogenous in that it will require a
>>> similar amount of effort to process, is there not a case to be made
>>> to have as many queues as consumers, and to round robin the work?
>>>
>> Could work if the processing time for each work unit is exactly the same
>> (otherwise one or more consumers will be idle part of the time), but in
>> most cases that is not guaranteed. A simple example is fetching data
>> over the network: even if the data size is always the same, there will
>> be differences because of network load variations.
>>
>> If you use one queue, each consumer fetches a new work unit as soon it
>> has consumed the previous one. All consumers will be working as long as
>> there is work to do, without having to write any code to do the load
>> balancing.
>>
>> With one queue for each consumer, you either have to assume that the
>> average processing time is the same (otherwise some consumers will be
>> idle at the end, while others are still busy processing work units), or
>> you need some clever code in the producer(s) or the driving code to
>> balance the loads. That's extra complexity for little or no benefit.
>>
>> I like the simplicity of having one queue: the producer(s) put work
>> units on the queue with no concern which consumer will process them or
>> how many consumers there even are; likewise the consumer(s) don't know
>> and don't need to know where their work units come from. And the work
>> gets automatically distributed to whichever consumer has first finished
>> its previous work unit.
>>
>
> This is all true in the case of a job that starts, runs and finishes.
> I am not so sure it applies to something that has a long life.
>
>
>>> And if the stuff you pass around needs disparate effort to consume,
>>> it seems to me that you can more easily balance the load by having
>>> specialised consumers, instead of instances of one humungous "I can
>>> eat anything" consumer.
>>>
>> If there is a semantic difference, maybe yes; but I think it makes no
>> sense to differentiate purely on the expected execution times.
>>
>
> The idea is basically that you have the code that classifies in one
> place only, instead of running in all the instances of the consumer.
> Feels better to me, somehow.
> <snip>
If the classifying you're doing is just based on expected time to
consume the item, then I think your plan to use separate queues is
misguided.
If the consumers are interchangeable in their abilities, then feeding
them from a single queue is more efficient, both on average wait time,
worst-case wait time, and on consumer utilization, in nearly all
non-pathological scenarios. Think the line at the bank. There's a good
reason they now have a single line for multiple tellers. If you have
five tellers, and one of your transactions is really slow, the rest of
the line is only slowed down by 20%, rather than a few people being
slowed down by a substantial amount because they happen to be behind the
slowpoke. 30 years ago, they'd have individual lines, and I tried in
vain to explain queuing theory to the bank manager.
Having said that, notice that sometimes the consumers in a computer are
not independent. If you're running 20 threads with this model, on a
processor with only two cores, and if the tasks are CPU bound, you're
wasting lots of thread-management time without gaining anything.
Similarly, if there are other shared resources that all the threads trip
over, it may not pay to do as many of them in parallel. But for
homogeneous consumers, do them with a single queue, and do benchmarks to
determine the optimum number of consumers to start.
More information about the Python-list
mailing list