Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison

Istvan Albert istvan.albert at gmail.com
Thu Jan 11 15:54:12 CET 2007


oyekomova wrote:

> csvread in Matlab for a very large csv file. Matlab read the file in
> 577 seconds. On the other hand, this code below kept running for over 2
> hours. Can this program be made more efficient? FYI

There must be something wrong with your setup/program. I work with
large csv files as well and  I never have performance problems of that
magnitude. Make sure you are not doing something else while parsing
your data.

Parsing 1 million lines with six columns with the program below takes
87 seconds on my laptop. Even your original version with extra slices
and all would still only be take about 50% more time.

import time, csv, random
from numpy import array

def make_data(rows=1E6, cols=6):
    fp = open('data.txt', 'wt')
    counter = range(cols)
    for row in xrange( int(rows) ):
        vals = map(str, [ random.random() for x in counter ] )
        fp.write( '%s\n' % ','.join( vals ) )
    fp.close()

def read_test():
    start  = time.clock()
    reader = csv.reader( file('data.txt') )
    data   = [ map(float, row) for row in reader ]
    data   = array(data, dtype = float)
    print 'Data size', len(data)
    print 'Elapsed', time.clock() - start

#make_data()
read_test()




More information about the Python-list mailing list