[Numpy-discussion] np.loadtxt : yet a new implementation...

Wed Dec 3 14:08:04 EST 2008

Alan G Isaac wrote:
> If I know my data is already clean
> and is handled nicely by the
> old loadtxt, will I be able to turn
> off and the special handling in
> order to retain the old load speed?
> 
> Alan Isaac
> 

Hi all,
  that's going in the same direction I was thinking about.
When I thought about an improved version of loadtxt, I wished it was
fault tolerant without loosing too much performance.
  So my solution was much simpler than the very nice genloadtxt function
-- and it works for me.

My ansatz is to leave the existing loadtxt function unchanged. I only
replaced the default converter calls by a fault tolerant converter
class. I attached a patch against io.py in numpy 1.2.1

The nice thing is that it not only handles missing values, but for
example also columns/fields with non-number characters. It just returns
nan in these cases. This is of practical importance for many datafiles
of astronomical catalogues, for example the Hipparcos catalogue data.

Regarding the performance, it is a little bit slower than the original
loadtxt, but not much: on my machine, 10x reading in a clean testfile
with 3 columns and 20000 rows I get the following results:

original loadtxt:  ~1.3s
modified loadtxt:  ~1.7s
new genloadtxt  :  ~2.7s

So you see, there is some loss of performance, but not as much as with
the new converter class.

I hope this solution is of interest ...

Manuel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: io.diff
Type: text/x-patch
Size: 678 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20081203/3b331964/attachment.bin>