[Numpy-discussion] genloadtxt : last call
Ryan May
rmay31 at gmail.com
Tue Dec 16 18:07:14 EST 2008
Pierre GM wrote:
> All,
> Here's the latest version of genloadtxt, with some recent corrections.
> With just a couple of tweaking, we end up with some decent speed: it's
> still slower than np.loadtxt, but only 15% so according to the test at
> the end of the package.
I have one more use issue that you may or may not want to fix. My problem is that
missing "values" are specified by their string representation, so that a string
representing a missing value, while having the same actual numeric value, may not
compare equal when represented as a string. For instance, if you specify that
-999.0 represents a missing value, but the value written to the file is -999.00,
you won't end up masking the -999.00 data point. I'm sure a test case will help
here:
def test_withmissing_float(self):
data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
test = mloadtxt(data, dtype=None, delimiter=',', missing='-999.0',
names=True)
control = ma.array([(0, 1.5), (2, -1.)],
mask=[(False, False), (False, True)],
dtype=[('A', np.int), ('B', np.float)])
print control
print test
assert_equal(test, control)
assert_equal(test.mask, control.mask)
Right now this fails with the latest version of genloadtxt. I've worked around
this by specifying a whole bunch of string representations of the values, but I
wasn't sure if you knew of a better way that this could be handled within
genloadtxt. I can only think of two ways, though I'm not thrilled with either:
1) Call the converter on the string form of the missing value and compare against
the converted value from the file to determine if missing. (Probably very slow)
2) Add a list of objects (ints, floats, etc.) to compare against after conversion
to determine if they're missing. This might needlessly complicate the function,
which I know you've already taken pains to optimize.
If there's no good way to do it, I'm content to live with a workaround.
Ryan
--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
More information about the NumPy-Discussion
mailing list