
Manuel Metz wrote:
Alan G Isaac wrote:
If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed?
Alan Isaac
Hi all, that's going in the same direction I was thinking about. When I thought about an improved version of loadtxt, I wished it was fault tolerant without loosing too much performance. So my solution was much simpler than the very nice genloadtxt function -- and it works for me.
My ansatz is to leave the existing loadtxt function unchanged. I only replaced the default converter calls by a fault tolerant converter class. I attached a patch against io.py in numpy 1.2.1
The nice thing is that it not only handles missing values, but for example also columns/fields with non-number characters. It just returns nan in these cases. This is of practical importance for many datafiles of astronomical catalogues, for example the Hipparcos catalogue data.
Regarding the performance, it is a little bit slower than the original loadtxt, but not much: on my machine, 10x reading in a clean testfile with 3 columns and 20000 rows I get the following results:
original loadtxt: ~1.3s modified loadtxt: ~1.7s new genloadtxt : ~2.7s
So you see, there is some loss of performance, but not as much as with the new converter class.
I hope this solution is of interest ...
Manuel
Oops, wrong version of the diff file. Wanted to name the class "_faulttolerantconv" ...