[Numpy-discussion] Deprecate Promotion of numbers to strings?

Sebastian Berg
Thu Apr 30 13:31:45 EDT 2020

Hi all,

in https://github.com/numpy/numpy/pull/15925 I propose to deprecate
promotion of strings and numbers. I have to double check whether this
has a large effect on pandas, but it currently seems to me that it will
be reasonable.

This means that `np.promote_types("S", "int8")`, etc. will lead to an
error instead of returning `"S4"`.  For the user, I believe the two
main visible changes are that:

    np.array(["string", 0])

will stop creating a string array and return either an `object` array
or give an error (object array would be the default currently).

Another larger visible change will be code such as:

    np.concatenate([np.array(["string"]), np.array([2])])

will result in an error instead of returning a string array. (Users
will have to cast manually here.)

The alternative is to return an object array also for the concatenate
example.  I somewhat dislike that because `object` is not homogeneously
typed and we thus lose type information.  This also affects functions
that wish to cast inputs to a common type (ufuncs also do this
A further example of this and discussion is at the end of the mail [1].

So the first question is whether we can form an agreement that an error
is the better choice for `concatenate` and `np.promote_types()`.
I.e. there is no one dtype that can faithfully represent both strings
and integers. (This is currently the case e.g. for datetime64 and

The second question is what to do for:

    np.array(["string", 0])

which currently always returns strings.  Arguably, it must also either
return an `object` array, or raise an error (requiring the user to pick
string or object using `dtype=object`).

The default would be to create a FutureWarning that an `object` array
will be returned for `np.asarray(["string", 0])` in the future.
But if we know already that we prefer an error, it would be better to
give a DeprecationWarning right away. (It just does not seem nice to
change the same thing twice even if the workaround is identical.)




A second more in-depth point is that code such as:

    common_dtype = np.result_type(arr1, arr2)  # or promote_types
    arr1 = arr1.astype(common_dtype, copy=False)
    arr2 = arr2.astype(common_dtype, copy=False)

will currently use `string` in this case while it would error in the
future. This already fails with other type combinations such as
`datetime64` and `float64` at the moment.

The main alternative to this proposal is to return `object` for the
common dtype, since an object array is not homogeneously typed, it
arguably can represent both inputs.  I do not quite like this choice
personally because in the above example, it may be that the next line
is something like:

    return arr1 * arr2

in which case, the preferred return may be `str` and not `object`.
We currently never promote to `object` unless one of the arrays is
already an `object` array, and that seems like the right choice to me.

