[Numpy-discussion] proposed change to recarray access

Allan Haldane allanhaldane at gmail.com
Wed Jan 14 18:08:20 EST 2015


Hello all,

I've submitted a pull request on github which changes how string values
in recarrays are returned, which may break old code.

https://github.com/numpy/numpy/pull/5454
See also: https://github.com/numpy/numpy/issues/3993

Previously, recarray fields of type 'S' or 'U' (ie, strings) would be
returned as chararrays when accessed by attribute, but ndarrays when
accessed by indexing:

    >>> arr = np.array([('abc ', 1), ('abc', 2)],
                       dtype=[('str', 'S4'), ('id', int)])
    >>> arr = arr.view(np.recarray)
    >>> type(arr.str)
        numpy.core.defchararray.chararray
    >>> type(arr['str'])
        numpy.core.records.recarray

Chararray is deprecated, and furthermore this led to bugs in my code
since chararrays trim trailing whitespace but but ndarrays do not (and I
was not aware of conversion to chararray). For example:

    >>> arr.str[0] == arr.str[1]
    True
    >>> arr['str'][0] == arr['str'][1]
    False

In the pull request I have changed recarray attribute access so ndarrays
are always returned. I think this is a sensible thing to do but it may
break code which depends on chararray features (including the trimmed
whitespace).

Does this sound reasonable?

Best,
Allan



More information about the NumPy-Discussion mailing list