Parallelising code

Mathieu Prevot mathieu.prevot at gmail.com
Mon Sep 15 21:58:45 CEST 2008


2008/9/15 psaffrey at googlemail.com <psaffrey at googlemail.com>:
> I have some file processing code that has to deal with quite a lot of
> data. I have a quad core machine, so I wondered whether I could take
> advantage of some parallelism.
>
> Essentially, I have a number of CSV files, let's say 100, each
> containing about 8000 data points. For each point, I need to look up
> some other data structures (generated in advance) and append the point
> to a relevant list. I wondered whether I could get each core to handle
> a few files each. I have a few questions:
>
> - Am I actually going to get any speed up from parallelism, or is it
> likely that most of my processing time is spent reading files? I guess
> I can profile for this?
>
> - Is list.append() thread safe? (not sure if this is the right term)
> what I mean is, can two separate processors file a point in the same
> list at the same time without anything horrible happening? Do I need
> to do anything special (mutex or whatever) to make this happen, or
> will it happen automatically?

You won't take advantage of your cores with a pure and single python
script. Python threads are useful for UI, files operations, all but
concurrent processing. The simpler way to do concurrent processing is
to use Popen from subrocess, that'll create new processes.

Notice that you can call python scripts from another one eg a manager
and as many workers as you want. IMO it's the simpler design and less
work for making concurrent processes.
Ideally make your workers not need to feedback with variable, or
anything more complex than a return value. Also, make them not write
the same file. They can read the same file without problem.

Remark that you can manage lock etc from the manager script.

I'm not sure python semaphore allow interprocess communication like c
semaphores [1] ; check this. A workaround is to send to stderr tuples.

HTH,
Mathieu

[1] Programming with POSIX Threads, David R. Butenhof, http://tinyurl.com/6hpkol



More information about the Python-list mailing list