larry.bates at vitalEsafe.com
Mon Sep 15 20:11:40 CEST 2008
psaffrey at googlemail.com wrote:
> I have some file processing code that has to deal with quite a lot of
> data. I have a quad core machine, so I wondered whether I could take
> advantage of some parallelism.
> Essentially, I have a number of CSV files, let's say 100, each
> containing about 8000 data points. For each point, I need to look up
> some other data structures (generated in advance) and append the point
> to a relevant list. I wondered whether I could get each core to handle
> a few files each. I have a few questions:
> - Am I actually going to get any speed up from parallelism, or is it
> likely that most of my processing time is spent reading files? I guess
> I can profile for this?
> - Is list.append() thread safe? (not sure if this is the right term)
> what I mean is, can two separate processors file a point in the same
> list at the same time without anything horrible happening? Do I need
> to do anything special (mutex or whatever) to make this happen, or
> will it happen automatically?
> Thanks in advance for any guidance,
Put the data into a database first to see if it is actually too slow.
If it is take a look at an in-memory database or perhaps something as simple as
memcached could help.
More information about the Python-list