NumPy should not silently promote numbers to strings
This is one of my oldest NumPy pain-points:
np.array([1, 2, 'three']) array(['1', '2', 'three'], dtype='
This is almost never what I want. In many cases, I simply write dtype=object, but for others (e.g., numpy.where), it's a minor annoyance to explicitly cast inputs to the right type. Autoconverting numbers into strings occasionally introduces real bugs (e.g., where using `np.nan` as a sentinel value for NA when working with strings, as in https://github.com/pydata/xarray/pull/1847), but mostly just hides bugs until later. It's certainly very un-Pythonic. The sane promotion rule would be `np.promote_types(str, float) -> object`, not a size 32 string. Is it way too late to fix this for NumPy, or is this something we could change in a major release? It would certainly need at least a deprecation cycle. This is easy enough to introduce accidentally that there are undoubtedly many users whose code would break if we changed this.
Presumably you would extend that to all (str, np.number), or even (str,
np.generic_)?
I suppose there’s the argument that with python-3-only support around the
corner, even (str, bytes) should go to object.
Right now, promote_types gives examples in the docs of int/string
conversions, so changing it might be tricky.
On the other hand, the docs also falsely claim that the conversion is
associative https://github.com/numpy/numpy/pull/10554/files, which your
proposed change would fix.
On Thu, 8 Feb 2018 at 22:12 Stephan Hoyer
This is one of my oldest NumPy pain-points:
np.array([1, 2, 'three']) array(['1', '2', 'three'], dtype='
This is almost never what I want. In many cases, I simply write dtype=object, but for others (e.g., numpy.where), it's a minor annoyance to explicitly cast inputs to the right type.
Autoconverting numbers into strings occasionally introduces real bugs (e.g., where using `np.nan` as a sentinel value for NA when working with strings, as in https://github.com/pydata/xarray/pull/1847), but mostly just hides bugs until later. It's certainly very un-Pythonic.
The sane promotion rule would be `np.promote_types(str, float) -> object`, not a size 32 string.
Is it way too late to fix this for NumPy, or is this something we could change in a major release? It would certainly need at least a deprecation cycle. This is easy enough to introduce accidentally that there are undoubtedly many users whose code would break if we changed this.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Feb 8, 2018 at 11:00 PM Eric Wieser
Presumably you would extend that to all (str, np.number), or even (str, np.generic_)?
Yes, I'm currently doing (np.character, np.number) and (np.character, np.bool_). But only in direct consultation with the diagram of NumPy's type hierarchy :).
I suppose there’s the argument that with python-3-only support around the corner, even (str, bytes) should go to object.
Yes, that's also pretty bad. The current behavior (str, bytes) -> str relies on bytes being valid ASCII:
np.array([b'\xFF', u'cd']) UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
It exactly matches Python 2's str/unicode behavior, but doesn't make sense at all in a Python 3 world.
This has recently been a major point point for Matplotlib for the
implementation of string-categoricals as well.
Having numpy go to object or fail on `np.asarray([1, 2, 'foo'])` would make
things much easier for us.
Tom
On Fri, Feb 9, 2018 at 2:22 AM Stephan Hoyer
On Thu, Feb 8, 2018 at 11:00 PM Eric Wieser
wrote: Presumably you would extend that to all (str, np.number), or even (str, np.generic_)?
Yes, I'm currently doing (np.character, np.number) and (np.character, np.bool_). But only in direct consultation with the diagram of NumPy's type hierarchy :).
I suppose there’s the argument that with python-3-only support around the corner, even (str, bytes) should go to object.
Yes, that's also pretty bad.
The current behavior (str, bytes) -> str relies on bytes being valid ASCII:
np.array([b'\xFF', u'cd']) UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
It exactly matches Python 2's str/unicode behavior, but doesn't make sense at all in a Python 3 world. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (3)
-
Eric Wieser
-
Stephan Hoyer
-
Thomas Caswell