
Hello, I'm not sure if it's expected behaviour or a bug, so I decided to write here. First an example: In [4]: array([2**63]) Out[4]: array([9223372036854775808], dtype=uint64) In [5]: array([2**63-1, 2**63]) Out[5]: array([9.22337204e+18, 9.22337204e+18]) The docs for `numpy.array` mention, that: dtype : data-type, optional The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. I understand the type promotions here, but I believe that the documentation is wrong in this case. Indeed, the minumum type in the latter case would be 'uint64'. Is it a bug worth submitting/fixing? -- Z wyrazami szacunku Michał Radwański With kind regards Michał Radwański

On Mon, 2021-03-01 at 01:30 +0100, Michal Radwanski wrote:
Thanks, this is a known issue, e.g.: https://github.com/numpy/numpy/issues/14883 and https://github.com/numpy/numpy/issues/16287 Currently, my view is that trying to "fix" it so that the result is truly minimal is probably doomed to introduce unnecessary complexity and/or will just make the oddities slightly more hard to find. Instead, my stance is that we should be to refuse to guess anything beside the "default integer" users pass in integers. That would probably mean you get an error that `2**63` cannot be represented by `int64` forcing you to be explicit about the dtype you expect. (In the long run, it might also return an `object` array. [1]) With regards to the documentation... `np.array` promotes inputs as they come in (depth first currently). I.e. in a "left-to-right" fashion. That basically means, that you are right and "minimal" will not always be true, due to our promotion rules. But the bigger confusion is that Python Integers are mapped to NumPy dtypes by finding the first one in the following list which can represent the value: * C long: int64 on 64bit linux/mac, otherwise (all windows!) int32 * C long long: int64 on all relevant platforms AFAIK * C unsigned long long: uint64 on all relevant platforms AFAIK * object Which is an attempt at "minimal" of course. If we have an idea how to capture especially this integer behaviour in the docs, that may be a good idea. (The way the promotion is done also breaks the "minimal" claim, but that is much more subtle.) Cheers, Sebastian [1] However, before that happens, we may also consider an API where you have to explicitly allow the `np.array` call to fall back to `object` in cases where promotion fails – including this case. I.e. with something like: np.array(..., dtype="allow-object-fallback") # of course shorter (I can't find the issue about it right now, there is at least one where this was discussed.)

On Mon, 2021-03-01 at 01:30 +0100, Michal Radwanski wrote:
Thanks, this is a known issue, e.g.: https://github.com/numpy/numpy/issues/14883 and https://github.com/numpy/numpy/issues/16287 Currently, my view is that trying to "fix" it so that the result is truly minimal is probably doomed to introduce unnecessary complexity and/or will just make the oddities slightly more hard to find. Instead, my stance is that we should be to refuse to guess anything beside the "default integer" users pass in integers. That would probably mean you get an error that `2**63` cannot be represented by `int64` forcing you to be explicit about the dtype you expect. (In the long run, it might also return an `object` array. [1]) With regards to the documentation... `np.array` promotes inputs as they come in (depth first currently). I.e. in a "left-to-right" fashion. That basically means, that you are right and "minimal" will not always be true, due to our promotion rules. But the bigger confusion is that Python Integers are mapped to NumPy dtypes by finding the first one in the following list which can represent the value: * C long: int64 on 64bit linux/mac, otherwise (all windows!) int32 * C long long: int64 on all relevant platforms AFAIK * C unsigned long long: uint64 on all relevant platforms AFAIK * object Which is an attempt at "minimal" of course. If we have an idea how to capture especially this integer behaviour in the docs, that may be a good idea. (The way the promotion is done also breaks the "minimal" claim, but that is much more subtle.) Cheers, Sebastian [1] However, before that happens, we may also consider an API where you have to explicitly allow the `np.array` call to fall back to `object` in cases where promotion fails – including this case. I.e. with something like: np.array(..., dtype="allow-object-fallback") # of course shorter (I can't find the issue about it right now, there is at least one where this was discussed.)
participants (2)
-
Michal Radwanski
-
Sebastian Berg