CSV performance
Peter Otten
__peter__ at web.de
Mon Apr 27 10:59:20 EDT 2009
psaffrey at googlemail.com wrote:
> Thanks for your replies. Many apologies for not including the right
> information first time around. More information is below.
>
> I have tried running it just on the csv read:
> $ ./largefilespeedtest.py
> working at file largefile.txt
> finished: 3.860000.2
>
>
> A tiny bit of background on the final application: this is biological
> data from an affymetrix platform. The csv files are a chromosome name,
> a coordinate and a data point, like this:
>
> chr1 3754914 1.19828
> chr1 3754950 1.56557
> chr1 3754982 1.52371
>
> In the "simple data structures" cod below, I do some jiggery pokery
> with the chromosome names to save me storing the same string millions
> of times.
> $ ./affyspeedtest.py
> reading affy file largefile.txt
> finished: 15.540000.2
It looks like most of the time is not spent in the csv.reader().
Here's an alternative way to read your data:
rows = fh.read().split()
coords = numpy.array(map(int, rows[1::3]), dtype=int)
points = numpy.array(map(float, rows[2::3]), dtype=float)
chromio.writelines(map(chrommap.__getitem__, rows[::3]))
Do things improve if you simplify your code like that?
Peter
More information about the Python-list
mailing list