[Numpy-discussion] Deprecate Promotion of numbers to strings?
sebastian at sipsolutions.net
Thu Apr 30 13:31:45 EDT 2020
in https://github.com/numpy/numpy/pull/15925 I propose to deprecate
promotion of strings and numbers. I have to double check whether this
has a large effect on pandas, but it currently seems to me that it will
This means that `np.promote_types("S", "int8")`, etc. will lead to an
error instead of returning `"S4"`. For the user, I believe the two
main visible changes are that:
will stop creating a string array and return either an `object` array
or give an error (object array would be the default currently).
Another larger visible change will be code such as:
will result in an error instead of returning a string array. (Users
will have to cast manually here.)
The alternative is to return an object array also for the concatenate
example. I somewhat dislike that because `object` is not homogeneously
typed and we thus lose type information. This also affects functions
that wish to cast inputs to a common type (ufuncs also do this
A further example of this and discussion is at the end of the mail .
So the first question is whether we can form an agreement that an error
is the better choice for `concatenate` and `np.promote_types()`.
I.e. there is no one dtype that can faithfully represent both strings
and integers. (This is currently the case e.g. for datetime64 and
The second question is what to do for:
which currently always returns strings. Arguably, it must also either
return an `object` array, or raise an error (requiring the user to pick
string or object using `dtype=object`).
The default would be to create a FutureWarning that an `object` array
will be returned for `np.asarray(["string", 0])` in the future.
But if we know already that we prefer an error, it would be better to
give a DeprecationWarning right away. (It just does not seem nice to
change the same thing twice even if the workaround is identical.)
A second more in-depth point is that code such as:
common_dtype = np.result_type(arr1, arr2) # or promote_types
arr1 = arr1.astype(common_dtype, copy=False)
arr2 = arr2.astype(common_dtype, copy=False)
will currently use `string` in this case while it would error in the
future. This already fails with other type combinations such as
`datetime64` and `float64` at the moment.
The main alternative to this proposal is to return `object` for the
common dtype, since an object array is not homogeneously typed, it
arguably can represent both inputs. I do not quite like this choice
personally because in the above example, it may be that the next line
is something like:
return arr1 * arr2
in which case, the preferred return may be `str` and not `object`.
We currently never promote to `object` unless one of the arrays is
already an `object` array, and that seems like the right choice to me.
More information about the NumPy-Discussion