[Numpy-discussion] loading data
Francesc Alted
faltet at pytables.org
Fri Jun 26 07:31:40 EDT 2009
A Friday 26 June 2009 13:09:13 Mag Gam escrigué:
> I really like the slice by slice idea!
Hmm, after looking at the np.loadtxt() docstrings it seems it works by loading
the complete file at once, so you shouldn't use this directly (unless you
split your big file before, but this will take time too). So, I'd say that
your best bet would be to use Python's `csv.reader()` iterator to iterate over
the lines in your file and setup a buffer (a NumPy array/recarray would be
fine), so that when the buffer is full it is written to the HDF5 file. That
should be pretty optimal.
With this you will not try to load the entire file into memory, which is what
I think is probably killing the performance in your case (unless your machine
has much more memory than 50 GB, that is).
--
Francesc Alted
More information about the NumPy-Discussion
mailing list