paul at boddie.org.uk
Tue Sep 16 11:16:12 CEST 2008
On 15 Sep, 18:46, "psaff... at googlemail.com" <psaff... at googlemail.com>
> I have some file processing code that has to deal with quite a lot of
> data. I have a quad core machine, so I wondered whether I could take
> advantage of some parallelism.
Take a look at this page for some solutions:
In addition, Jython and IronPython provide the ability to use threads
> Essentially, I have a number of CSV files, let's say 100, each
> containing about 8000 data points. For each point, I need to look up
> some other data structures (generated in advance) and append the point
> to a relevant list. I wondered whether I could get each core to handle
> a few files each. I have a few questions:
> - Am I actually going to get any speed up from parallelism, or is it
> likely that most of my processing time is spent reading files? I guess
> I can profile for this?
There are a few things to consider, and it is useful to see where most
of the time is being spent. One interesting exercise called "Wide
Finder 2", run by Tim Bray (see  for more details), investigated
the benefits of log file processing using many concurrent processes,
but it was often argued that the greatest speed-up over a naive serial
implementation could be achieved by optimising the input and output
and by choosing the right parsing strategy.
More information about the Python-list