Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison

oyekomova oyekomova at hotmail.com
Wed Jan 10 20:48:06 CET 2007


Thanks for your help. I compared the following code in NumPy with the
csvread in Matlab for a very large csv file. Matlab read the file in
577 seconds. On the other hand, this code below kept running for over 2
hours. Can this program be made more efficient? FYI - The csv file was
a simple 6 column file with a header row and more than a million
records.


import csv
from numpy import array
import time
t1=time.clock()
file_to_read = file('somename.csv','r')
read_from = csv.reader(file_to_read)
read_from.next()

datalist = [ map(float, row[:]) for row in read_from ]

# now the real data
data = array(datalist, dtype = float)

elapsed=time.clock()-t1
print elapsed












Robert Kern wrote:
> oyekomova wrote:
> > I would like to know how to convert a csv file with a header row into a
> > floating point array without the header row.
>
> Use the standard library module csv. Something like the following is a cheap and
> cheerful solution:
>
>
> import csv
> import numpy
>
> def float_array_from_csv(filename, skip_header=True):
>     f = open(filename)
>     try:
>         reader = csv.reader(f)
>         floats = []
>         if skip_header:
>             reader.next()
>         for row in reader:
>             floats.append(map(float, row))
>     finally:
>         f.close()
>
>     return numpy.array(floats)
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>  that is made terrible by our own mad attempt to interpret it as though it had
>  an underlying truth."
>   -- Umberto Eco




More information about the Python-list mailing list