
Hi all, in https://github.com/numpy/numpy/pull/15925 I propose to deprecate promotion of strings and numbers. I have to double check whether this has a large effect on pandas, but it currently seems to me that it will be reasonable. This means that `np.promote_types("S", "int8")`, etc. will lead to an error instead of returning `"S4"`. For the user, I believe the two main visible changes are that: np.array(["string", 0]) will stop creating a string array and return either an `object` array or give an error (object array would be the default currently). Another larger visible change will be code such as: np.concatenate([np.array(["string"]), np.array([2])]) will result in an error instead of returning a string array. (Users will have to cast manually here.) The alternative is to return an object array also for the concatenate example. I somewhat dislike that because `object` is not homogeneously typed and we thus lose type information. This also affects functions that wish to cast inputs to a common type (ufuncs also do this sometimes). A further example of this and discussion is at the end of the mail [1]. So the first question is whether we can form an agreement that an error is the better choice for `concatenate` and `np.promote_types()`. I.e. there is no one dtype that can faithfully represent both strings and integers. (This is currently the case e.g. for datetime64 and float64.) The second question is what to do for: np.array(["string", 0]) which currently always returns strings. Arguably, it must also either return an `object` array, or raise an error (requiring the user to pick string or object using `dtype=object`). The default would be to create a FutureWarning that an `object` array will be returned for `np.asarray(["string", 0])` in the future. But if we know already that we prefer an error, it would be better to give a DeprecationWarning right away. (It just does not seem nice to change the same thing twice even if the workaround is identical.) Cheers, Sebastian [1] A second more in-depth point is that code such as: common_dtype = np.result_type(arr1, arr2) # or promote_types arr1 = arr1.astype(common_dtype, copy=False) arr2 = arr2.astype(common_dtype, copy=False) will currently use `string` in this case while it would error in the future. This already fails with other type combinations such as `datetime64` and `float64` at the moment. The main alternative to this proposal is to return `object` for the common dtype, since an object array is not homogeneously typed, it arguably can represent both inputs. I do not quite like this choice personally because in the above example, it may be that the next line is something like: return arr1 * arr2 in which case, the preferred return may be `str` and not `object`. We currently never promote to `object` unless one of the arrays is already an `object` array, and that seems like the right choice to me.