<div dir="ltr"><div><div>I agree with @Daniele's point, storing huge arrays in text files migh indicate a bad process.... but once these functions can be improved, why not? Unless this turns to be a burden to change.</div><div><br></div><div>Regarding the estimation of the array size, I don't see a big performance loss when the file iterator is exhausting once more in order to estimate the number of rows and pre-allocate the proper arrays to avoid using list of lists. The hardest part seems to be dealing with arrays of strings (perhaps easily solved with dtype=object) and structured arrays.</div></div><div><br></div><div>Cheers,</div><div>Saullo</div><div><br></div><div><br></div><div class="gmail_extra"><div class="gmail_quote">2014-10-26 18:00 GMT+01:00  <span dir="ltr"><<a href="mailto:numpy-discussion-request@scipy.org" target="_blank">numpy-discussion-request@scipy.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Send NumPy-Discussion mailing list submissions to<br>

        <a href="mailto:numpy-discussion@scipy.org">numpy-discussion@scipy.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:numpy-discussion-request@scipy.org">numpy-discussion-request@scipy.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:numpy-discussion-owner@scipy.org">numpy-discussion-owner@scipy.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of NumPy-Discussion digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. Re: Memory efficient alternative for np.loadtxt and<br>

      np.genfromtxt (Daniele Nicolodi)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Sun, 26 Oct 2014 17:42:32 +0100<br>

From: Daniele Nicolodi <<a href="mailto:daniele@grinta.net">daniele@grinta.net</a>><br>

Subject: Re: [Numpy-discussion] Memory efficient alternative for<br>

        np.loadtxt and np.genfromtxt<br>

To: <a href="mailto:numpy-discussion@scipy.org">numpy-discussion@scipy.org</a><br>

Message-ID: <<a href="mailto:544D2478.8020504@grinta.net">544D2478.8020504@grinta.net</a>><br>

Content-Type: text/plain; charset=windows-1252<br>

<br>

On 26/10/14 09:46, Saullo Castro wrote:<br>

> I would like to start working on a memory efficient alternative for<br>

> np.loadtxt and np.genfromtxt that uses arrays instead of lists to store<br>

> the data while the file iterator is exhausted.<br>

<br>

...<br>

<br>

> I would be glad if you could share your experience on this matter.<br>

<br>

I'm of the opinion that if your workflow requires you to regularly load<br>

large arrays from text files, something else needs to be fixed rather<br>

than the numpy speed and memory usage in reading data from text files.<br>

<br>

There are a number of data formats that are interoperable and allow to<br>

store data much more efficiently. hdf5 is one natural choice, maybe with<br>

the blosc compressor.<br>

<br>

Cheers,<br>

Daniele<br>

<br>

<br>

<br>

------------------------------<br>

<br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

<br>

<br>

End of NumPy-Discussion Digest, Vol 97, Issue 57<br>

************************************************<br>

</blockquote></div><br></div></div>