parallel csv-file processing
Paul Boddie
paul at boddie.org.uk
Fri Nov 9 07:48:42 EST 2007
On 9 Nov, 12:02, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>
> Why not pass the disk offsets to the job server (untested):
>
> n = 1000
> for i,_ in enumerate(reader):
> if i % n == 0:
> job_server.submit(calc_scores, reader.tell(), n)
>
> the remote process seeks to the appropriate place and processes n lines
> starting from there.
This is similar to a lot of the smarter solutions for Tim Bray's "Wide
Finder" - a problem apparently in the same domain. See here for more
details:
http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder
Lots of discussion about more than just parallel processing/
programming, too.
Paul
More information about the Python-list
mailing list