Pickle based workflow - looking for advice

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Apr 14 16:14:29 CEST 2015


On Tue, 14 Apr 2015 11:45 pm, Chris Angelico wrote:

> On Tue, Apr 14, 2015 at 11:08 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Tue, 14 Apr 2015 05:58 pm, Fabien wrote:
>>
>>> On 14.04.2015 06:05, Chris Angelico wrote:
>>>> Not sure what you mean, here. Any given file will be written by
>>>> exactly one process? No possible problem. Multiprocessing within one
>>>> application doesn't change that.
>>>
>>> yes that's what I meant. Thanks!
>>
>> It's not that simple though. If you require files to be written in
>> precisely a certain order, then parallel processing requires
>> synchronisation.
>>
>> Suppose you write A, then B, then C, then D, each in it's own process (or
>> thread). So the B process has to wait for A to finish, the C process has
>> to wait for B to finish, and so on. Otherwise you could find yourself
>> with C reading the data from B before B is finished writing it.
> 
> Sure, which is a matter of writer/reader conflicts on a single file -
> nothing to do with "writing multiple files simultaneously" which was
> the question raised.

Fabien: "So I'm trying to crack open an old grenade I found, and I was
wondering if I need a ball-peen hammer or whether a regular hammer will be
okay."

You: "Oh, a regular hammer will be fine."

Me: "Just a minute. You're hitting a grenade with a hammer hard enough to
crack the case. That could be bad. It might explode."

You: "Sure, but the OP never asked about that. He just asked if the kind of
hammer makes a difference."

:-P


Seriously though, the OP did specify in his first post that there is at
least one dependency of the "B depends on A finishing first" kind. I
understood that A writes to a file, B reads that file and writes to a new
file, C reads that file and writes to yet another file, and so on. In which
case, *writing* the files is the least of his problems, it's the exploding
grenade, er, synchronisation problems that will get him.

:-)


Apart from "embarrassingly parallel" problems, thread- and
multiprocessing-based workflows are often trickier than they may seen ahead
of time, and may even be slower than a purely sequential algorithm:

http://en.wikipedia.org/wiki/Parallel_slowdown
http://en.wikipedia.org/wiki/Embarrassingly_parallel



-- 
Steven




More information about the Python-list mailing list