python.list at tim.thechases.com
Mon Apr 27 14:47:05 CEST 2009
> I'm using the CSV library to process a large amount of data -
> 28 files, each of 130MB. Just reading in the data from one
> file and filing it into very simple data structures (numpy
> arrays and a cstringio) takes around 10 seconds. If I just
> slurp one file into a string, it only takes about a second, so
> I/O is not the bottleneck. Is it really taking 9 seconds just
> to split the lines and set the variables?
You've omitted one important test: spinning through the file
with csv-parsing, but not doing an "filing it into very simple
data structures". Without that metric, there's no way to know
whether the csv module is at fault, or if you're doing something
malperformant with the data-structures.
More information about the Python-list