Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

I agree with @Daniele's point, storing huge arrays in text files migh indicate a bad process.... but once these functions can be improved, why not? Unless this turns to be a burden to change. Regarding the estimation of the array size, I don't see a big performance loss when the file iterator is exhausting once more in order to estimate the number of rows and pre-allocate the proper arrays to avoid using list of lists. The hardest part seems to be dealing with arrays of strings (perhaps easily solved with dtype=object) and structured arrays. Cheers, Saullo 2014-10-26 18:00 GMT+01:00 <numpy-discussion-request@scipy.org>:
Send NumPy-Discussion mailing list submissions to numpy-discussion@scipy.org
To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/numpy-discussion or, via email, send a message with subject or body 'help' to numpy-discussion-request@scipy.org
You can reach the person managing the list at numpy-discussion-owner@scipy.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of NumPy-Discussion digest..."
Today's Topics:
1. Re: Memory efficient alternative for np.loadtxt and np.genfromtxt (Daniele Nicolodi)
----------------------------------------------------------------------
Message: 1 Date: Sun, 26 Oct 2014 17:42:32 +0100 From: Daniele Nicolodi <daniele@grinta.net> Subject: Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt To: numpy-discussion@scipy.org Message-ID: <544D2478.8020504@grinta.net> Content-Type: text/plain; charset=windows-1252
On 26/10/14 09:46, Saullo Castro wrote:
I would like to start working on a memory efficient alternative for np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the data while the file iterator is exhausted.
...
I would be glad if you could share your experience on this matter.
I'm of the opinion that if your workflow requires you to regularly load large arrays from text files, something else needs to be fixed rather than the numpy speed and memory usage in reading data from text files.
There are a number of data formats that are interoperable and allow to store data much more efficiently. hdf5 is one natural choice, maybe with the blosc compressor.
Cheers, Daniele
------------------------------
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
End of NumPy-Discussion Digest, Vol 97, Issue 57 ************************************************
participants (1)
-
Saullo Castro