[Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

Sun Oct 26 14:27:54 EDT 2014

I agree with @Daniele's point, storing huge arrays in text files migh
indicate a bad process.... but once these functions can be improved, why
not? Unless this turns to be a burden to change.

Regarding the estimation of the array size, I don't see a big performance
loss when the file iterator is exhausting once more in order to estimate
the number of rows and pre-allocate the proper arrays to avoid using list
of lists. The hardest part seems to be dealing with arrays of strings
(perhaps easily solved with dtype=object) and structured arrays.

Cheers,
Saullo

2014-10-26 18:00 GMT+01:00 <numpy-discussion-request at scipy.org>:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at scipy.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>    1. Re: Memory efficient alternative for np.loadtxt and
>       np.genfromtxt (Daniele Nicolodi)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 26 Oct 2014 17:42:32 +0100
> From: Daniele Nicolodi <daniele at grinta.net>
> Subject: Re: [Numpy-discussion] Memory efficient alternative for
>         np.loadtxt and np.genfromtxt
> To: numpy-discussion at scipy.org
> Message-ID: <544D2478.8020504 at grinta.net>
> Content-Type: text/plain; charset=windows-1252
>
> On 26/10/14 09:46, Saullo Castro wrote:
> > I would like to start working on a memory efficient alternative for
> > np.loadtxt and np.genfromtxt that uses arrays instead of lists to store
> > the data while the file iterator is exhausted.
>
> ...
>
> > I would be glad if you could share your experience on this matter.
>
> I'm of the opinion that if your workflow requires you to regularly load
> large arrays from text files, something else needs to be fixed rather
> than the numpy speed and memory usage in reading data from text files.
>
> There are a number of data formats that are interoperable and allow to
> store data much more efficiently. hdf5 is one natural choice, maybe with
> the blosc compressor.
>
> Cheers,
> Daniele
>
>
>
> ------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> End of NumPy-Discussion Digest, Vol 97, Issue 57
> ************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141026/60b78854/attachment.html>