[Numpy-discussion] Possible roadmap addendum: building better text file readers

Fri Mar 2 01:58:26 EST 2012

*In an effort to build a consensus of what numpy's New and Improved text
file readers should look like, I've put together a short list of the main
points discussed in this thread so far:*
*
*
1. Loading text files using loadtxt/genfromtxt need a significant
performance boost (I think at least an order of magnitude increase in
performance is very doable based on what I've seen with Erin's recfile code)
2. Improved memory usage. Memory used for reading in a text file shouldn’t
be more than the file itself, and less if only reading a subset of file.
3. Keep existing interfaces for reading text files (loadtxt, genfromtxt,
etc). No new ones.
4. Underlying code should keep IO iteration and transformation of data
separate (awaiting more thoughts from Travis on this).
5. Be able to plug in different transformations of data at low level (also
awaiting more thoughts from Travis).
6. memory mapping of text files?
7. Eventually reduce memory usage even more by using same object for
duplicate values in array (depends on implementing enum dtype?)

Anything else?

-Jay Bourque
continuum.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120302/23a2d0cf/attachment.html>