[Numpy-discussion] convert csv file into recarray without pre-specifying dtypes and variable names

Vincent Nijs v-nijs at kellogg.northwestern.edu
Sat Jul 7 18:18:05 EDT 2007


Thanks for the reference John! csv2rec is about 30% faster than my code on
the same data. 

If I read the code in csv2rec correctly it converts the data as it is being
read using the csv modules. My setup reads in the whole dataset into an
array of strings and then converts the columns as appropriate.

Best,

Vincent


On 7/6/07 8:53 PM, "John Hunter" <jdh2358 at gmail.com> wrote:

> On 7/6/07, Vincent Nijs <v-nijs at kellogg.northwestern.edu> wrote:
>> I wrote the attached (small) program to read in a text/csv file with
>> different data types and convert it into a recarray without having to
>> pre-specify the dtypes or variables names. I am just too lazy to type-in
>> stuff like that :) The supported types are int, float, dates, and strings.
>> 
>> I works pretty well but it is not (yet) as fast as I would like so I was
>> wonder if any of the numpy experts on this list might have some suggestion
>> on how to speed it up. I need to read 500MB-1GB files so speed is important
>> for me.
> 
> In matplotlib.mlab svn, there is a function csv2rec that does the
> same.  You may want to compare implementations in case we can
> fruitfully cross pollinate them.  In the examples directy, there is an
> example script examples/loadrec.py
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> 





More information about the NumPy-Discussion mailing list