[Numpy-discussion] Deprecate Promotion of numbers to strings?

Eric Wieser wieser.eric+numpy at gmail.com
Thu Apr 30 13:47:43 EDT 2020


> Another larger visible change will be code such as:
>
>     np.concatenate([np.array(["string"]), np.array([2])])
>
> will result in an error instead of returning a string array. (Users
> will have to cast manually here.)

I wonder if we can lessen the blow by allowing
`np.concatenate([np.array(["string"]), np.array([2])], casting='unsafe',
dtype=str)` or similar in its place.
It seems a little unfortunate that with this change, we lose the ability to
concatenate numbers to strings without making intermediate copies.

Eric



On Thu, 30 Apr 2020 at 18:32, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> in https://github.com/numpy/numpy/pull/15925 I propose to deprecate
> promotion of strings and numbers. I have to double check whether this
> has a large effect on pandas, but it currently seems to me that it will
> be reasonable.
>
> This means that `np.promote_types("S", "int8")`, etc. will lead to an
> error instead of returning `"S4"`.  For the user, I believe the two
> main visible changes are that:
>
>     np.array(["string", 0])
>
> will stop creating a string array and return either an `object` array
> or give an error (object array would be the default currently).
>
> Another larger visible change will be code such as:
>
>     np.concatenate([np.array(["string"]), np.array([2])])
>
> will result in an error instead of returning a string array. (Users
> will have to cast manually here.)
>
> The alternative is to return an object array also for the concatenate
> example.  I somewhat dislike that because `object` is not homogeneously
> typed and we thus lose type information.  This also affects functions
> that wish to cast inputs to a common type (ufuncs also do this
> sometimes).
> A further example of this and discussion is at the end of the mail [1].
>
>
> So the first question is whether we can form an agreement that an error
> is the better choice for `concatenate` and `np.promote_types()`.
> I.e. there is no one dtype that can faithfully represent both strings
> and integers. (This is currently the case e.g. for datetime64 and
> float64.)
>
>
> The second question is what to do for:
>
>     np.array(["string", 0])
>
> which currently always returns strings.  Arguably, it must also either
> return an `object` array, or raise an error (requiring the user to pick
> string or object using `dtype=object`).
>
> The default would be to create a FutureWarning that an `object` array
> will be returned for `np.asarray(["string", 0])` in the future.
> But if we know already that we prefer an error, it would be better to
> give a DeprecationWarning right away. (It just does not seem nice to
> change the same thing twice even if the workaround is identical.)
>
> Cheers,
>
> Sebastian
>
>
> [1]
>
> A second more in-depth point is that code such as:
>
>     common_dtype = np.result_type(arr1, arr2)  # or promote_types
>     arr1 = arr1.astype(common_dtype, copy=False)
>     arr2 = arr2.astype(common_dtype, copy=False)
>
> will currently use `string` in this case while it would error in the
> future. This already fails with other type combinations such as
> `datetime64` and `float64` at the moment.
>
> The main alternative to this proposal is to return `object` for the
> common dtype, since an object array is not homogeneously typed, it
> arguably can represent both inputs.  I do not quite like this choice
> personally because in the above example, it may be that the next line
> is something like:
>
>     return arr1 * arr2
>
> in which case, the preferred return may be `str` and not `object`.
> We currently never promote to `object` unless one of the arrays is
> already an `object` array, and that seems like the right choice to me.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200430/06283141/attachment.html>


More information about the NumPy-Discussion mailing list