[SciPy-User] Alternatives to genfromtxt and loadtxt?

Yury V. Zaytsev yury at shurup.com
Sat May 14 06:53:57 EDT 2011


On Fri, 2011-05-13 at 22:40 +0000, Giorgos Tzampanakis wrote: 
> I have numeric data in ascii files, each file about 800 MB. Loading such a
> file to Octave takes about 30 seconds. On numpy it is so slow that I've
> never had the patience to see it through to the end.

If the layout is more or less simple, you may have better luck with
reading files with Python's built-in CSV reader and only then converting
the lists to NumPy arrays. 

I know it must definitively be not the best solution out there, but it
takes zero effort and I have able to load 500 Mb large files in a matter
of dozens of seconds without any problems:

        import csv

        import numpy as np

        # Auto-detect the CSV dialect that is being used
        #
        dialect = csv.Sniffer().sniff(fp.read(1024))
    
        fp.seek(0)
        reader = csv.reader(fp, dialect)

        data = []
    
        for row in reader:
        
            # Filter out empty fields
            #
            row = [x for x in row if x != ""]

            ...

            data.append(row)

        ...

        matrix = np.asarray(data, dtype = np.float)

-- 
Sincerely yours,
Yury V. Zaytsev





More information about the SciPy-User mailing list