[Numpy-discussion] proposed change to recarray access
Allan Haldane
allanhaldane at gmail.com
Wed Jan 14 18:08:20 EST 2015
Hello all,
I've submitted a pull request on github which changes how string values
in recarrays are returned, which may break old code.
https://github.com/numpy/numpy/pull/5454
See also: https://github.com/numpy/numpy/issues/3993
Previously, recarray fields of type 'S' or 'U' (ie, strings) would be
returned as chararrays when accessed by attribute, but ndarrays when
accessed by indexing:
>>> arr = np.array([('abc ', 1), ('abc', 2)],
dtype=[('str', 'S4'), ('id', int)])
>>> arr = arr.view(np.recarray)
>>> type(arr.str)
numpy.core.defchararray.chararray
>>> type(arr['str'])
numpy.core.records.recarray
Chararray is deprecated, and furthermore this led to bugs in my code
since chararrays trim trailing whitespace but but ndarrays do not (and I
was not aware of conversion to chararray). For example:
>>> arr.str[0] == arr.str[1]
True
>>> arr['str'][0] == arr['str'][1]
False
In the pull request I have changed recarray attribute access so ndarrays
are always returned. I think this is a sensible thing to do but it may
break code which depends on chararray features (including the trimmed
whitespace).
Does this sound reasonable?
Best,
Allan
More information about the NumPy-Discussion
mailing list