CSV performance
dean
deank at yahoo.com
Mon Apr 27 17:56:47 EDT 2009
On Mon, 27 Apr 2009 04:22:24 -0700 (PDT), psaffrey at googlemail.com wrote:
> I'm using the CSV library to process a large amount of data - 28
> files, each of 130MB. Just reading in the data from one file and
> filing it into very simple data structures (numpy arrays and a
> cstringio) takes around 10 seconds. If I just slurp one file into a
> string, it only takes about a second, so I/O is not the bottleneck. Is
> it really taking 9 seconds just to split the lines and set the
> variables?
I assume you're reading a 130 MB text file in 1 second only after OS
already cashed it, so you're not really measuring disk I/O at all.
Parsing a 130 MB text file will take considerable time no matter what.
Perhaps you should consider using a database instead of CSV.
More information about the Python-list
mailing list