Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

Dec. 3, 2008


      Manuel Metz wrote:
...
Alan G Isaac wrote:
...
If I know my data is already clean
and is handled nicely by the
old loadtxt, will I be able to turn
off and the special handling in
order to retain the old load speed?
Alan Isaac
Hi all,
  that's going in the same direction I was thinking about.
When I thought about an improved version of loadtxt, I wished it was
fault tolerant without loosing too much performance.
  So my solution was much simpler than the very nice genloadtxt function
-- and it works for me.
My ansatz is to leave the existing loadtxt function unchanged. I only
replaced the default converter calls by a fault tolerant converter
class. I attached a patch against io.py in numpy 1.2.1
The nice thing is that it not only handles missing values, but for
example also columns/fields with non-number characters. It just returns
nan in these cases. This is of practical importance for many datafiles
of astronomical catalogues, for example the Hipparcos catalogue data.
Regarding the performance, it is a little bit slower than the original
loadtxt, but not much: on my machine, 10x reading in a clean testfile
with 3 columns and 20000 rows I get the following results:
original loadtxt:  ~1.3s
modified loadtxt:  ~1.7s
new genloadtxt  :  ~2.7s
So you see, there is some loss of performance, but not as much as with
the new converter class.
I hope this solution is of interest ...
Manuel
Oops, wrong version of the diff file. Wanted to name the class
"_faulttolerantconv" ...