__peter__ at web.de
Mon Apr 27 16:59:20 CEST 2009
psaffrey at googlemail.com wrote:
> Thanks for your replies. Many apologies for not including the right
> information first time around. More information is below.
> I have tried running it just on the csv read:
> $ ./largefilespeedtest.py
> working at file largefile.txt
> finished: 3.860000.2
> A tiny bit of background on the final application: this is biological
> data from an affymetrix platform. The csv files are a chromosome name,
> a coordinate and a data point, like this:
> chr1 3754914 1.19828
> chr1 3754950 1.56557
> chr1 3754982 1.52371
> In the "simple data structures" cod below, I do some jiggery pokery
> with the chromosome names to save me storing the same string millions
> of times.
> $ ./affyspeedtest.py
> reading affy file largefile.txt
> finished: 15.540000.2
It looks like most of the time is not spent in the csv.reader().
Here's an alternative way to read your data:
rows = fh.read().split()
coords = numpy.array(map(int, rows[1::3]), dtype=int)
points = numpy.array(map(float, rows[2::3]), dtype=float)
Do things improve if you simplify your code like that?
More information about the Python-list