parallel csv-file processing
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Fri Nov 9 07:10:30 EST 2007
On Fri, 09 Nov 2007 02:51:10 -0800, Michel Albert wrote:
> Obviously this won't work as you cannot access a slice of a csv-file.
> Would it be possible to subclass the csv.reader class in a way that
> you can somewhat efficiently access a slice?
An arbitrary slice? I guess not as all records before must have been read
because the lines are not equally long.
> The obvious way is to do the following:
>
> buffer = []
> for line in reader:
> buffer.append(line)
> if len(buffer) == 1000:
> f = job_server.submit(calc_scores, buffer)
> buffer = []
With `itertools.islice()` this can be written as:
while True:
buffer = list(itertools.islice(reader, 1000))
if not buffer:
break
f = job_server.submit(calc_scores, buffer)
More information about the Python-list
mailing list