[SciPy-User] Alternatives to genfromtxt and loadtxt?
Yury V. Zaytsev
yury at shurup.com
Sat May 14 06:53:57 EDT 2011
On Fri, 2011-05-13 at 22:40 +0000, Giorgos Tzampanakis wrote:
> I have numeric data in ascii files, each file about 800 MB. Loading such a
> file to Octave takes about 30 seconds. On numpy it is so slow that I've
> never had the patience to see it through to the end.
If the layout is more or less simple, you may have better luck with
reading files with Python's built-in CSV reader and only then converting
the lists to NumPy arrays.
I know it must definitively be not the best solution out there, but it
takes zero effort and I have able to load 500 Mb large files in a matter
of dozens of seconds without any problems:
import csv
import numpy as np
# Auto-detect the CSV dialect that is being used
#
dialect = csv.Sniffer().sniff(fp.read(1024))
fp.seek(0)
reader = csv.reader(fp, dialect)
data = []
for row in reader:
# Filter out empty fields
#
row = [x for x in row if x != ""]
...
data.append(row)
...
matrix = np.asarray(data, dtype = np.float)
--
Sincerely yours,
Yury V. Zaytsev
More information about the SciPy-User
mailing list