[Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

Eelco Hoogendoorn hoogendoorn.eelco at gmail.com
Sun Oct 26 09:21:03 EDT 2014


Im not sure why the memory doubling is necessary. Isnt it possible to
preallocate the arrays and write to them? I suppose this might be
inefficient though, in case you end up reading only a small subset of rows
out of a mostly corrupt file? But that seems to be a rather uncommon corner
case.

Either way, id say a doubling of memory use is fair game for numpy.
Generality is more important than absolute performance. The most important
thing is that temporary python datastructures are avoided. That shouldn't
be too hard to accomplish, and would realize most of the performance and
memory gains, I imagine.

On Sun, Oct 26, 2014 at 12:54 PM, Jeff Reback <jeffreback at gmail.com> wrote:

> you should have a read here/
> http://wesmckinney.com/blog/?p=543
>
> going below the 2x memory usage on read in is non trivial and costly in
> terms of performance
>
> On Oct 26, 2014, at 4:46 AM, Saullo Castro <saullogiovani at gmail.com>
> wrote:
>
> I would like to start working on a memory efficient alternative for
> np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the
> data while the file iterator is exhausted.
>
> The motivation came from this SO question:
>
> http://stackoverflow.com/q/26569852/832621
>
> where for huge arrays the current NumPy ASCII readers are really slow and
> require ~6 times more memory. This case I tested with Pandas' read_csv()
> and it required 2 times more memory.
>
> I would be glad if you could share your experience on this matter.
>
> Greetings,
> Saullo
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141026/45cfee0e/attachment.html>


More information about the NumPy-Discussion mailing list