[Numpy-discussion] loading data

Fri Jun 26 07:31:40 EDT 2009

A Friday 26 June 2009 13:09:13 Mag Gam escrigué:
> I really like the slice by slice idea!

Hmm, after looking at the np.loadtxt() docstrings it seems it works by loading 
the complete file at once, so you shouldn't use this directly (unless you 
split your big file before, but this will take time too).  So, I'd say that 
your best bet would be to use Python's `csv.reader()` iterator to iterate over 
the lines in your file and setup a buffer (a NumPy array/recarray would be 
fine), so that when the buffer is full it is written to the HDF5 file.  That 
should be pretty optimal.

With this you will not try to load the entire file into memory, which is what 
I think is probably killing the performance in your case (unless your machine 
has much more memory than 50 GB, that is).

-- 
Francesc Alted