Hi all, At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2]. To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn. I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome. Juan. [1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
On Sun, Dec 6, 2020 at 12:31 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Hi Juan, Let me start with a disclaimer that I'm an end user, and as such it's very easy for me to be bold when it comes to deprecations :) But I experienced the same thing that you describe in https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73942937... :
[I]t was very surprising to me when I found out that np.float is float. For the longest time I thought that np.float was equivalent to "whatever the default float value is on my platform", and considered it best practice to use that instead of plain float. 😅 I think that is a common misconception.
And I'm pretty sure the vast majority of end users faces this. The proper np.float32 and other types are intuitive enough that people don't go out of their way to read the documentation in detail, and it's highly unexpected that some `np.*` types are mere aliases. Now, this should probably not be a problem as long as people only stick these aliases into `dtype` keyword arguments, because that works as expected (based on the wrong premise). But once you extrapolate from the `dtype=np.int` behaviour to "`np.int` must be my native numpy int type" you can easily get subtle bugs. For instance, you might expect `isinstance(this_type, np.int)` to give you True if `this_type` is the type of an item of an array with `dtype=np.int`. To be fair I'm not sure that I've ever been bitten by this personally... but once you're aware of the pitfall it seems really ominous. I guess one helpful question is this: among all the code churn needed to fix the breakage did you find any bugs that were revealed by the deprecation? If that's the case (in scikit-image or any other large downstream library) then that would be a good argument for going forward with the deprecation. Cheers, András
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876... _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
I checked pandas and astropy and both have several uses of the deprecated types but should be easy to fix. I suppose the question is if we want to make them fix things *right now* :) Chuck
I guess if the answer is to stop people from from numpy import * there is a good fix for that doesn’t involve deprecating dtype=np.int. If the answer is to deprecate np.int(1) == int(1) then one can add a warning to the __init__ of the np.int class, but continue to subclass the python int class. It just doesn’t seem worthwhile to to stop people from using dtype=np.int, which seem to read: “I want this to be a numpy integer, not necessarily a python integer”. On Sat, Dec 5, 2020 at 10:14 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
I checked pandas and astropy and both have several uses of the deprecated types but should be easy to fix. I suppose the question is if we want to make them fix things *right now* :)
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Dec 5, 2020 at 9:24 PM Mark Harfouche <mark.harfouche@gmail.com> wrote:
If the answer is to deprecate
np.int(1) == int(1)
then one can add a warning to the __init__ of the np.int class, but continue to subclass the python int class.
It just doesn’t seem worthwhile to to stop people from using dtype=np.int, which seem to read:
“I want this to be a numpy integer, not necessarily a python integer”.
The problem is that there is assuredly code that inadvertently relies upon this (mis)feature. If we change the behavior of np.int() to create np.int64() objects instead of int() objects, it is likely to result in breaking some user code. Even with a prior warning, this breakage may be surprising and very hard to track down. In contrast, it's much safer to simply remove np.int entirely, because if users ignore the deprecation they end up with an error. This is a general feature for deprecations: it's much safer to remove functionality than it is to change behavior. So on the whole, I think this is the right call.
On Sat, Dec 5, 2020 at 10:14 PM Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
I checked pandas and astropy and both have several uses of the deprecated types but should be easy to fix. I suppose the question is if we want to make them fix things *right now* :)
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Dec 6, 2020 at 12:52 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Sat, Dec 5, 2020 at 9:24 PM Mark Harfouche <mark.harfouche@gmail.com> wrote:
If the answer is to deprecate
np.int(1) == int(1)
then one can add a warning to the __init__ of the np.int class, but continue to subclass the python int class.
It just doesn’t seem worthwhile to to stop people from using dtype=np.int, which seem to read:
“I want this to be a numpy integer, not necessarily a python integer”.
The problem is that there is assuredly code that inadvertently relies upon this (mis)feature.
If we change the behavior of np.int() to create np.int64() objects instead of int() objects, it is likely to result in breaking some user code. Even with a prior warning, this breakage may be surprising and very hard to track down. In contrast, it's much safer to simply remove np.int entirely, because if users ignore the deprecation they end up with an error.
FWIW (and IIRC), *this* was the original misfeature. `np.int`, `np.bool`, and `np.float` were aliases for their corresponding default scalar types in the first numpy releases. However, too many people were doing `from numpy import *` and covering up the builtins. We renamed these aliases with trailing underscores to avoid that problem, but too many people (even in those early days) still had uses of `dtype=np.int`. Making `np.int is int` was the backwards-compatibility hack. -- Robert Kern
On Sat, 2020-12-05 at 20:12 -0700, Charles R Harris wrote:
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
I checked pandas and astropy and both have several uses of the deprecated types but should be easy to fix. I suppose the question is if we want to make them fix things *right now* :)
The reason why I thought it might be good to bring this up again is that I am not sure clear on how painful the deprecation is; which should be weighed against the benefit. And the benefit here is only moderate. Thus, with the things now in and a few more people exposed to it, if anyone thinks its a bad idea or that we should delay, I am all ears. Cheers, Sebastian
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Dec 6, 2020 at 4:23 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Sat, 2020-12-05 at 20:12 -0700, Charles R Harris wrote:
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446...
[2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
I checked pandas and astropy and both have several uses of the deprecated types but should be easy to fix. I suppose the question is if we want to make them fix things *right now* :)
The reason why I thought it might be good to bring this up again is that I am not sure clear on how painful the deprecation is; which should be weighed against the benefit. And the benefit here is only moderate.
It will be painful as in "lots of churn", but the fixes are straightforward. And it's clear many knowledgeable users didn't know they were aliases, so there is something to gain here. Whether or not we revert the deprecation, I'd be in favor of improving the docs to answer the most common questions and pitfalls, like: - What happens when I use Python builtin types with the dtype keyword? - How do I check if something is an integer array? Or a NumPy or Python integer? - What are default integer, float and complex precisions on all platforms? - How do I iterate over all floating point dtypes when writing tests? - Which of the many equivalent dtypes should I prefer? --> use float64, not float_ or double - warning: float128 and float96 do not exist on all platforms - https://github.com/scikit-learn/scikit-learn/wiki/C-integer-types%3A-the-mis... Related: it's still easy to have things leak into the namespace unintentionally - `np.sys` and `np.os` exist too. I think we can probably clean those up without a deprecation, but we should write some more public API tests that prevent this kind of thing. Cheers, Ralf
Thus, with the things now in and a few more people exposed to it, if anyone thinks its a bad idea or that we should delay, I am all ears.
Cheers,
Sebastian
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
If the CI noise in downstream libraries is particularly painful, we could switch to `PendingDeprecationWarning` instead of `DeprecationWarning` to make it easier to add the warnings to an ignore list. I think this might make the warning less visible to end users though, who are the users that this deprecation was really aimed at. Eric On Mon, 7 Dec 2020 at 11:39, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Dec 6, 2020 at 4:23 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Sat, 2020-12-05 at 20:12 -0700, Charles R Harris wrote:
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446...
[2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
I checked pandas and astropy and both have several uses of the deprecated types but should be easy to fix. I suppose the question is if we want to make them fix things *right now* :)
The reason why I thought it might be good to bring this up again is that I am not sure clear on how painful the deprecation is; which should be weighed against the benefit. And the benefit here is only moderate.
It will be painful as in "lots of churn", but the fixes are straightforward. And it's clear many knowledgeable users didn't know they were aliases, so there is something to gain here.
Whether or not we revert the deprecation, I'd be in favor of improving the docs to answer the most common questions and pitfalls, like:
- What happens when I use Python builtin types with the dtype keyword? - How do I check if something is an integer array? Or a NumPy or Python integer? - What are default integer, float and complex precisions on all platforms? - How do I iterate over all floating point dtypes when writing tests? - Which of the many equivalent dtypes should I prefer? --> use float64, not float_ or double - warning: float128 and float96 do not exist on all platforms - https://github.com/scikit-learn/scikit-learn/wiki/C-integer-types%3A-the-mis...
Related: it's still easy to have things leak into the namespace unintentionally - `np.sys` and `np.os` exist too. I think we can probably clean those up without a deprecation, but we should write some more public API tests that prevent this kind of thing.
Cheers, Ralf
Thus, with the things now in and a few more people exposed to it, if anyone thinks its a bad idea or that we should delay, I am all ears.
Cheers,
Sebastian
Chuck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype. Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace. Aaron Meurer On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876... _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`). I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1] OTOH, if someone wants to entertain switching... It could be interesting to see how (unfixed) downstream projects react to it. One approach would be: * Go ahead for now (deprecate) * Add a FutureWarning at some point that we _will_ start to export `np.bool` again (but `from numpy import *` is a problem?) * Aim to make `np.bool is np.bool_` at some point in the (far) future. It is multi-step (and I recall opinions that multi-step is bad). Although, I think the main argument against it was to not force users to modify code more than once. And I do not think that happens here. Of course we could use the `FutureWarning` right away, but I don't mind taking it slow. Cheers, Sebastian [1] I admit, probably almost nobody would notice. And usually using a Python `bool` is better...
Aaron Meurer
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876... _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time. Aaron Meurer
OTOH, if someone wants to entertain switching... It could be interesting to see how (unfixed) downstream projects react to it.
One approach would be:
* Go ahead for now (deprecate) * Add a FutureWarning at some point that we _will_ start to export `np.bool` again (but `from numpy import *` is a problem?) * Aim to make `np.bool is np.bool_` at some point in the (far) future.
It is multi-step (and I recall opinions that multi-step is bad). Although, I think the main argument against it was to not force users to modify code more than once. And I do not think that happens here.
Of course we could use the `FutureWarning` right away, but I don't mind taking it slow.
Cheers,
Sebastian
[1] I admit, probably almost nobody would notice. And usually using a Python `bool` is better...
Aaron Meurer
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446... [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876... _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Dec 9, 2020 at 4:08 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
Sometimes, we had the function first before Python added them to the builtins (e.g. sum(), any(), all(), IIRC). I think max() and min() are the main ones that we added after Python did, and we explicitly exclude them from __all__ to avoid clobbering the builtins. Shadowing the types (bool, int, float) historically tended to be more problematic than those functions. The first releases of numpy _did_ have those as the scalar types. That empirically turned out to cause more problems for people than sum() or any(), so we renamed the scalar types to have the trailing underscore. We only left the shadowed names as aliases for the builtins because enough people still had `dtype=np.float` in their code that we didn't want to break. All that said, "from numpy import *" is less common these days. We have been pretty successful at getting people on board with the np campaign. -- Robert Kern
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases. np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters. In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
Aaron Meurer
OTOH, if someone wants to entertain switching... It could be interesting to see how (unfixed) downstream projects react to it.
One approach would be:
* Go ahead for now (deprecate) * Add a FutureWarning at some point that we _will_ start to export `np.bool` again (but `from numpy import *` is a problem?) * Aim to make `np.bool is np.bool_` at some point in the (far) future.
It is multi-step (and I recall opinions that multi-step is bad). Although, I think the main argument against it was to not force users to modify code more than once. And I do not think that happens here.
Of course we could use the `FutureWarning` right away, but I don't mind taking it slow.
Cheers,
Sebastian
[1] I admit, probably almost nobody would notice. And usually using a Python `bool` is better...
Aaron Meurer
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446...
[2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases.
That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently). So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility? Right now, my main take-away from the discussion is that it would be good to clarify the release notes a bit more. Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`? Cheers, Sebastian
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
Aaron Meurer
OTOH, if someone wants to entertain switching... It could be interesting to see how (unfixed) downstream projects react to it.
One approach would be:
* Go ahead for now (deprecate) * Add a FutureWarning at some point that we _will_ start to export `np.bool` again (but `from numpy import *` is a problem?) * Aim to make `np.bool is np.bool_` at some point in the (far) future.
It is multi-step (and I recall opinions that multi-step is bad). Although, I think the main argument against it was to not force users to modify code more than once. And I do not think that happens here.
Of course we could use the `FutureWarning` right away, but I don't mind taking it slow.
Cheers,
Sebastian
[1] I admit, probably almost nobody would notice. And usually using a Python `bool` is better...
Aaron Meurer
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias < jni@fastmail.com> wrote:
Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2].
To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn.
I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome.
Juan.
[1]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73933446...
[2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]:
https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-73936876...
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases.
I'd agree with that.
That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently).
So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility?
Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise. Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.
Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`?
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense. The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs. Cheers, Ralf
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
Regarding np.bool specifically, if you want to deprecate this, you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases.
I'd agree with that.
That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently).
So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility?
Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise.
I meant limiting the current deprecation to `np.int`, maybe `np.long`, and a "carefully chosen" set. To be honest, I don't mind either way, so any stronger opinion will tip the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names). There are probably more doc updates that would be nice, I will suggest updating a separate issue for that.
Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.
Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`?
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
Right, there is one slight trickery because `np.intp` is often a great integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.) Cheers, Sebastian
Cheers, Ralf
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
I don't fully understand this argument - `np.bool` is already not the boolean dtype. Either: * The spec is suggesting that `pkg.bool` be some arbitrary object that can be passed into a dtype argument and will produce a boolean array. If this is the case, the spec could also just require that `dtype=builtins.bool` have this behavior. * The spec is suggesting that `pkg.bool` is some rich dtype object. Ignoring the question of whether this should be `np.bool_` or `np.dtype(np.bool_)`, it's currently neither, and changing it will break users relying on `np.bool(True) is True`. That's not to say this isn't a sensible thing for the specification to have, it's just something that numpy can't conform to without breaking code. While it would be great if `np.bool_` could be spelt `np.bool`, I really don't think we can make that change without a long deprecation first (if at all). Eric On Thu, 10 Dec 2020 at 20:00, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > Regarding np.bool specifically, if you want to deprecate > this, > you > might want to discuss this with us at the array API > standard > https://github.com/data-apis/array-api (which is currently > in > RFC > stage). The spec uses bool as the name for the boolean > dtype. > > Would it make sense for NumPy to change np.bool to just be > the > boolean > dtype object? Unlike int and float, there is no ambiguity > with > bool, > and NumPy clearly doesn't have any issues with shadowing > builtin > names > in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases.
I'd agree with that.
That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently).
So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility?
Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise.
I meant limiting the current deprecation to `np.int`, maybe `np.long`, and a "carefully chosen" set. To be honest, I don't mind either way, so any stronger opinion will tip the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names).
There are probably more doc updates that would be nice, I will suggest updating a separate issue for that.
Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.
Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`?
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
Right, there is one slight trickery because `np.intp` is often a great integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.)
Cheers,
Sebastian
Cheers, Ralf
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Dec 11, 2020 at 9:47 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
I don't fully understand this argument - `np.bool` is already not the boolean dtype. Either:
* The spec is suggesting that `pkg.bool` be some arbitrary object that can be passed into a dtype argument and will produce a boolean array. If this is the case, the spec could also just require that `dtype=builtins.bool` have this behavior.
Yes, this. * The spec is suggesting that `pkg.bool` is some rich dtype object.
Ignoring the question of whether this should be `np.bool_` or `np.dtype(np.bool_)`, it's currently neither, and changing it will break users relying on `np.bool(True) is True`. That's not to say this isn't a sensible thing for the specification to have, it's just something that numpy can't conform to without breaking code.
It can have richer behaviour, there's no constraints there - but it's not necessary.
While it would be great if `np.bool_` could be spelt `np.bool`, I really don't think we can make that change without a long deprecation first (if at all).
Given that that standard API would be in a new namespace (given backwards compat we can't possibly introduce it in the main namespace), there `bool` can be the numpy boolean dtype (if desired). The key point is that `bool_` is a terrible name, and keeping `np.bool` that you can use as a dtype specifier is desirable. Cheers, Ralf
Eric
On Thu, 10 Dec 2020 at 20:00, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote: > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > Regarding np.bool specifically, if you want to deprecate > > this, > > you > > might want to discuss this with us at the array API > > standard > > https://github.com/data-apis/array-api (which is currently > > in > > RFC > > stage). The spec uses bool as the name for the boolean > > dtype. > > > > Would it make sense for NumPy to change np.bool to just be > > the > > boolean > > dtype object? Unlike int and float, there is no ambiguity > > with > > bool, > > and NumPy clearly doesn't have any issues with shadowing > > builtin > > names > > in its namespace. > > We could keep the Python alias around (which for `dtype=` is > the > same > as `np.bool_`). > > I am not sure I like the idea of immediately shadowing the > builtin. > That is a switch we can avoid flipping (without warning); > `np.bool_` > and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases.
I'd agree with that.
That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently).
So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility?
Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise.
I meant limiting the current deprecation to `np.int`, maybe `np.long`, and a "carefully chosen" set. To be honest, I don't mind either way, so any stronger opinion will tip the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names).
There are probably more doc updates that would be nice, I will suggest updating a separate issue for that.
Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.
Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`?
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
Right, there is one slight trickery because `np.intp` is often a great integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.)
Cheers,
Sebastian
Cheers, Ralf
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Dec 11, 2020 at 1:47 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
I don't fully understand this argument - `np.bool` is already not the boolean dtype. Either:
The spec does deviate from what NumPy currently does in some places. If we wanted to just copy NumPy exactly, there wouldn't be a need for a specification.
* The spec is suggesting that `pkg.bool` be some arbitrary object that can be passed into a dtype argument and will produce a boolean array. If this is the case, the spec could also just require that `dtype=builtins.bool` have this behavior. * The spec is suggesting that `pkg.bool` is some rich dtype object. Ignoring the question of whether this should be `np.bool_` or `np.dtype(np.bool_)`, it's currently neither, and changing it will break users relying on `np.bool(True) is True`. That's not to say this isn't a sensible thing for the specification to have, it's just something that numpy can't conform to without breaking code.
This what it currently says (https://data-apis.github.io/array-api/latest/API_specification/data_types.ht...) Data types (“dtypes”) are objects that can be used as dtype specifiers in functions and methods (e.g., zeros((2, 3), dtype=float32) ). A conforming implementation may add methods or attributes to data type objects; however, these methods and attributes are not included in this specification. So basically, np.bool just needs to be something that can be used as a dtype. The dtype objects names don't have any requirements on them. A library could have float64 == 'f8', for example. It isn't written there presently but really the only thing that needs to work for the dtype objects is == comparison (or at least, it will be impossible for the test suite to test dtype behavior if a.dtype == float64 doesn't work). So np.bool == builtins.bool is actually fine. My concern here was that the discussion was about deprecating np.bool, meaning it would be removed from the namespace, which goes against what is currently in the spec. Aaron Meurer
While it would be great if `np.bool_` could be spelt `np.bool`, I really don't think we can make that change without a long deprecation first (if at all).
Eric
On Thu, 10 Dec 2020 at 20:00, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote: > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > Regarding np.bool specifically, if you want to deprecate > > this, > > you > > might want to discuss this with us at the array API > > standard > > https://github.com/data-apis/array-api (which is currently > > in > > RFC > > stage). The spec uses bool as the name for the boolean > > dtype. > > > > Would it make sense for NumPy to change np.bool to just be > > the > > boolean > > dtype object? Unlike int and float, there is no ambiguity > > with > > bool, > > and NumPy clearly doesn't have any issues with shadowing > > builtin > > names > > in its namespace. > > We could keep the Python alias around (which for `dtype=` is > the > same > as `np.bool_`). > > I am not sure I like the idea of immediately shadowing the > builtin. > That is a switch we can avoid flipping (without warning); > `np.bool_` > and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases.
I'd agree with that.
That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently).
So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility?
Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise.
I meant limiting the current deprecation to `np.int`, maybe `np.long`, and a "carefully chosen" set. To be honest, I don't mind either way, so any stronger opinion will tip the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names).
There are probably more doc updates that would be nice, I will suggest updating a separate issue for that.
Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.
Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`?
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
Right, there is one slight trickery because `np.intp` is often a great integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.)
Cheers,
Sebastian
Cheers, Ralf
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Dec 11, 2020 at 1:12 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Fri, Dec 11, 2020 at 1:47 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
you might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype.
I don't fully understand this argument - `np.bool` is already not the
boolean dtype. Either:
The spec does deviate from what NumPy currently does in some places. If we wanted to just copy NumPy exactly, there wouldn't be a need for a specification.
I wouldn't take that as a premise. Specifying a subset of the vast existing NumPy API would be a quite valuable specification in its own right. I find the motivation for deviation laid out in the Purpose and Scope <https://data-apis.github.io/array-api/latest/purpose_and_scope.html#introduc...> section to be reasonably convincing that deviation might be needed *somewhere*. The question then is, is *this* deviation supporting that stated motivation, or is it taking the opportunity of a redesign to rationalize the names more to our current tastes? Given the mode of adopting the standard (a separate subpackage), that's a reasonable choice to make, but let's be clear about the motivation. I submit that keeping the name `bool_` does not make it any harder for other array APIs to adopt the standard. It's just that few people would design a new API with that name if they were designing a greenfield API. -- Robert Kern
On Thu, Dec 10, 2020 at 9:00 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer@gmail.com> wrote:
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > Regarding np.bool specifically, if you want to deprecate > this, > you > might want to discuss this with us at the array API > standard > https://github.com/data-apis/array-api (which is currently > in > RFC > stage). The spec uses bool as the name for the boolean > dtype. > > Would it make sense for NumPy to change np.bool to just be > the > boolean > dtype object? Unlike int and float, there is no ambiguity > with > bool, > and NumPy clearly doesn't have any issues with shadowing > builtin > names > in its namespace.
We could keep the Python alias around (which for `dtype=` is the same as `np.bool_`).
I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1]
NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time.
It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases.
I'd agree with that.
That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently).
So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility?
Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise.
I meant limiting the current deprecation to `np.int`, maybe `np.long`, and a "carefully chosen" set.
Just deprecation `np.int` may make sense. That will already raise awareness, and leaving `np.float` as-is may prevent a lot of churn. And we could then still deprecate `np.float` later. I also don't feel strongly about `float` either way though. I'm not sure why you'd specifically touch `long`, it's not really relevant and it's not a builtin. Cheers, Ralf To be honest, I don't mind either way, so any stronger opinion will tip
the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names).
There are probably more doc updates that would be nice, I will suggest updating a separate issue for that.
Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.
Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`?
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
Right, there is one slight trickery because `np.intp` is often a great integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.)
Cheers,
Sebastian
Cheers, Ralf
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, 2020-12-11 at 11:35 +0100, Ralf Gommers wrote:
On Thu, Dec 10, 2020 at 9:00 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
<snip>
Just deprecation `np.int` may make sense. That will already raise awareness, and leaving `np.float` as-is may prevent a lot of churn. And we could then still deprecate `np.float` later. I also don't feel strongly about `float` either way though.
I'm not sure why you'd specifically touch `long`, it's not really relevant and it's not a builtin.
`np.long is np.int is int` as it was a builtin on Python 2. But it looks like a C-long. In `dtype=` usage it actually ends up being a C-long (but it might even be nice to consider modifying the default `int` on windows at some point. At that point the "long" alias would be very confusing). OTOH, right now the only way to spell C-long is with `np.int_` which doesn't help. Cheers, Sebastian
Cheers, Ralf
To be honest, I don't mind either way, so any stronger opinion will tip
the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names).
There are probably more doc updates that would be nice, I will suggest updating a separate issue for that.
Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.
Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`?
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
Right, there is one slight trickery because `np.intp` is often a great integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.)
Cheers,
Sebastian
Cheers, Ralf
np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters.
In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both): https://jax.readthedocs.io/en/latest/type_promotion.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
I kinda disagree with this. I want to have a way to say, give me an array of the same type as the default NumPy type (for either ints or floats). This will prevent casting back and forth as different arrays are combined. In other words, as long as NumPy itself flips back and forth (depending on locale), I think users will in many cases want to flip back and forth with it? Juan.
On Sat, 2020-12-12 at 12:34 +1100, Juan Nunez-Iglesias wrote:
I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.
The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.
I kinda disagree with this. I want to have a way to say, give me an array of the same type as the default NumPy type (for either ints or floats). This will prevent casting back and forth as different arrays are combined. In other words, as long as NumPy itself flips back and forth (depending on locale), I think users will in many cases want to flip back and forth with it?
But "default" in NumPy really doesn't mean a whole lot? I can think of three places where "defaults" exists: 1. `np.array([1])` will default to a C-long (as will `np.uint8(1) + 1`) 2. Sum and product upcast to C-long (and pretty much only those): np.sum(np.arange(10, dtype=np.int8)) np.product(np.arange(10, dtype=np.int8)) 3. NumPy uses `np.intp` for all indexing operations internally and some functions many functions which return integers related to indexing (e.g. `np.nonzero()`). [1] The first two points have no logic at all besides: windows thinks long is always 32bit and others think long is 64bit on 64bit systems. The last point does have some logic. Generally, the only reason to stick to a certain type would be that mixing types can be slower (using a non `intp` to index or doing math with a mix of 32bit and 64bit integers). From a library perspective, I wonder how often you actually expect a "default integer" input, as opposed to 32bit or 64bit depending on the whims of the user; or `intp` because it is "indexing related". It would be interesting to see if we can change the default at some point. It might also be tricky: There may be quite a bit of code expecting `long` (e.g. Cython extensions or `scipy.special` may or may not notice such a change). Cheers, Sebastian [1] intp is technically intptr_t in C, while indexing only requires an ssize_t I think. That probably matters on no currently supported systems, but system where it matters do exist (OpenVMS is one that just came up, and we may support in the future).
Juan. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 13 Dec 2020, at 6:25 am, Sebastian Berg <sebastian@sipsolutions.net> wrote:
But "default" in NumPy really doesn't mean a whole lot? I can think of three places where "defaults" exists:
Huh? There are platform-specific defaults for literally every array creation function in NumPy? In [1]: np.array([4, 9]).dtype Out[1]: dtype('int64') In [2]: np.array([3., 0.]).dtype Out[2]: dtype('float64') In [3]: np.arange(5).dtype Out[3]: dtype('int64') In [4]: np.arange(5.).dtype Out[4]: dtype('float64') In [5]: np.empty(5).dtype Out[5]: dtype('float64') In [6]: np.zeros(5).dtype Out[6]: dtype('float64') In [7]: np.full(5, 5).dtype Out[7]: dtype('int64') In [8]: np.full(5, 5.).dtype Out[8]: dtype('float64’) The list goes on… And, indeed, mixing types can cause implicit casting, and thus both slowness and unexpected type promotion, which brings with it its own bugs… Again, I think it is valuable to have syntax to express `np.zeros(…, dtype=<whatever-dtype-np.array(…)-would-give-for-my-data>)`. Juan.
On Sun, 2020-12-13 at 19:00 +1100, Juan Nunez-Iglesias wrote:
On 13 Dec 2020, at 6:25 am, Sebastian Berg < sebastian@sipsolutions.net> wrote:
But "default" in NumPy really doesn't mean a whole lot? I can think of three places where "defaults" exists:
Huh? There are platform-specific defaults for literally every array creation function in NumPy?
In [1]: np.array([4, 9]).dtype Out[1]: dtype('int64')
<snip>
The list goes on…
I should have been more clear about this and my opinion on it: 1. The whole list comes down to my point 1: when confronted with a Python integer, NumPy will typically use a C-long [1]. Additionally, `dtype=int` is always the same as long: `np.dtype(int) == np.dtype("long")`. The reason why I see that as a single point, is that it is defined in a single place in C [1]. (The `np.dtype(int)` is a second place.) 2. I agree with Ralf that this is "random". On the same computer you can easily get a wrong result for the identical code because you boot into windows instead of linux [2]. `long` is not a good default! It is 32bit on windows and 64bit on (64bit) linux! That should confuse the majority of our users (and probably many who are aware of C integer types). Good defaults are awesome, but I just can't see how `long` is a good default. There were good reasons for it on Python 2, but that is not relevant anymore. 3. I think that `intp` would be a much saner default for most code. It gives a system dependent result, but two points are in its favor: * NumPy generates `intp` in quite a lot of places * It is always safe (and fast) to index arrays with `intp`
And, indeed, mixing types can cause implicit casting, and thus both slowness and unexpected type promotion, which brings with it its own bugs… Again, I think it is valuable to have syntax to express `np.zeros(…, dtype=<whatever-dtype-np.array(…)-would-give-for-my- data>)`.
Yes, it is valuable, but I am unsure we should advise to use it... Cheers, Sebastian [1] Currently defined here: https://github.com/numpy/numpy/blob/7a42940e610b77cee2f98eb88aed5e66ef6d8c2a... Which will use `long` normally, but `long long` (64bit) if that fails and even `unsigned long long` if *that* fails also. [2] I would not be surprised if there are quite a few libraries with bugs for very large arrays, that are simply not found yet, because nobody tried to run the code on very large arrays on a windows workstation yet.
Juan. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Dec 13, 2020 at 7:29 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Sun, 2020-12-13 at 19:00 +1100, Juan Nunez-Iglesias wrote:
On 13 Dec 2020, at 6:25 am, Sebastian Berg < sebastian@sipsolutions.net> wrote:
But "default" in NumPy really doesn't mean a whole lot? I can think of three places where "defaults" exists:
Huh? There are platform-specific defaults for literally every array creation function in NumPy?
In [1]: np.array([4, 9]).dtype Out[1]: dtype('int64')
<snip>
The list goes on…
I should have been more clear about this and my opinion on it:
1. The whole list comes down to my point 1: when confronted with a Python integer, NumPy will typically use a C-long [1]. Additionally, `dtype=int` is always the same as long: `np.dtype(int) == np.dtype("long")`.
The reason why I see that as a single point, is that it is defined in a single place in C [1]. (The `np.dtype(int)` is a second place.)
2. I agree with Ralf that this is "random". On the same computer you can easily get a wrong result for the identical code because you boot into windows instead of linux [2]. `long` is not a good default! It is 32bit on windows and 64bit on (64bit) linux! That should confuse the majority of our users (and probably many who are aware of C integer types). Good defaults are awesome, but I just can't see how `long` is a good default. There were good reasons for it on Python 2, but that is not relevant anymore.
3. I think that `intp` would be a much saner default for most code. It gives a system dependent result, but two points are in its favor:
* NumPy generates `intp` in quite a lot of places * It is always safe (and fast) to index arrays with `intp`
And, indeed, mixing types can cause implicit casting, and thus both slowness and unexpected type promotion, which brings with it its own bugs… Again, I think it is valuable to have syntax to express `np.zeros(…, dtype=<whatever-dtype-np.array(…)-would-give-for-my- data>)`.
Yes, it is valuable, but I am unsure we should advise to use it...
Agreed, it should be possible for people who know that's what they want, but an "always int64" default would be way better. Before we had 32-bit CI, I developed on 32-bit Linux on purpose, and found multiple newly-introduced bugs in NumPy and Scipy each release cycle. Risking correctness issues like overflows is far worse than possible sub-optimal performance. For that same reason, float96/float128 are very annoying. Users don't realize that those aren't portable. Cheers, Ralf
Cheers,
Sebastian
[1] Currently defined here:
https://github.com/numpy/numpy/blob/7a42940e610b77cee2f98eb88aed5e66ef6d8c2a... Which will use `long` normally, but `long long` (64bit) if that fails and even `unsigned long long` if *that* fails also.
[2] I would not be surprised if there are quite a few libraries with bugs for very large arrays, that are simply not found yet, because nobody tried to run the code on very large arrays on a windows workstation yet.
Juan. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (10)
-
Aaron Meurer -
Andras Deak -
Charles R Harris -
Eric Wieser -
Juan Nunez-Iglesias -
Mark Harfouche -
Ralf Gommers -
Robert Kern -
Sebastian Berg -
Stephan Hoyer