[Numpy-discussion] np.{bool,float,int} deprecation

Fri Dec 11 13:10:31 EST 2020

On Fri, Dec 11, 2020 at 1:47 AM Eric Wieser <wieser.eric+numpy at gmail.com> wrote:
>
> >  you might want to discuss this with us at the array API standard
> > https://github.com/data-apis/array-api (which is currently in RFC
> > stage). The spec uses bool as the name for the boolean dtype.
>
> I don't fully understand this argument - `np.bool` is already not the boolean dtype. Either:

The spec does deviate from what NumPy currently does in some places.
If we wanted to just copy NumPy exactly, there wouldn't be a need for
a specification.

>
> * The spec is suggesting that `pkg.bool` be some arbitrary object that can be passed into a dtype argument and will produce a boolean array.
>   If this is the case, the spec could also just require that `dtype=builtins.bool` have this behavior.
> * The spec is suggesting that `pkg.bool` is some rich dtype object.
>   Ignoring the question of whether this should be `np.bool_` or `np.dtype(np.bool_)`, it's currently neither, and changing it will break users relying on `np.bool(True) is True`.
>   That's not to say this isn't a sensible thing for the specification to have, it's just something that numpy can't conform to without breaking code.

This what it currently says
(https://data-apis.github.io/array-api/latest/API_specification/data_types.html)

Data types (“dtypes”) are objects that can be used as dtype specifiers
in functions and methods (e.g., zeros((2, 3), dtype=float32) ). A
conforming implementation may add methods or attributes to data type
objects; however, these methods and attributes are not included in
this specification.

So basically, np.bool just needs to be something that can be used as a
dtype. The dtype objects names don't have any requirements on them. A
library could have float64 == 'f8', for example. It isn't written
there presently but really the only thing that needs to work for the
dtype objects is == comparison (or at least, it will be impossible for
the test suite to test dtype behavior if a.dtype == float64 doesn't
work).

So np.bool == builtins.bool is actually fine. My concern here was that
the discussion was about deprecating np.bool, meaning it would be
removed from the namespace, which goes against what is currently in
the spec.

Aaron Meurer

>
> While it would be great if `np.bool_` could be spelt `np.bool`, I really don't think we can make that change without a long deprecation first (if at all).
>
> Eric
>
> On Thu, 10 Dec 2020 at 20:00, Sebastian Berg <sebastian at sipsolutions.net> wrote:
>>
>> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
>> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
>> > sebastian at sipsolutions.net>
>> > wrote:
>> >
>> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
>> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer at gmail.com>
>> > > > wrote:
>> > > >
>> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
>> > > > > <sebastian at sipsolutions.net> wrote:
>> > > > > >
>> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
>> > > > > > > Regarding np.bool specifically, if you want to deprecate
>> > > > > > > this,
>> > > > > > > you
>> > > > > > > might want to discuss this with us at the array API
>> > > > > > > standard
>> > > > > > > https://github.com/data-apis/array-api (which is currently
>> > > > > > > in
>> > > > > > > RFC
>> > > > > > > stage). The spec uses bool as the name for the boolean
>> > > > > > > dtype.
>> > > > > > >
>> > > > > > > Would it make sense for NumPy to change np.bool to just be
>> > > > > > > the
>> > > > > > > boolean
>> > > > > > > dtype object? Unlike int and float, there is no ambiguity
>> > > > > > > with
>> > > > > > > bool,
>> > > > > > > and NumPy clearly doesn't have any issues with shadowing
>> > > > > > > builtin
>> > > > > > > names
>> > > > > > > in its namespace.
>> > > > > >
>> > > > > > We could keep the Python alias around (which for `dtype=` is
>> > > > > > the
>> > > > > > same
>> > > > > > as `np.bool_`).
>> > > > > >
>> > > > > > I am not sure I like the idea of immediately shadowing the
>> > > > > > builtin.
>> > > > > > That is a switch we can avoid flipping (without warning);
>> > > > > > `np.bool_`
>> > > > > > and `bool` are fairly different beasts? [1]
>> > > > >
>> > > > > NumPy already shadows a lot of builtins, in many cases, in ways
>> > > > > that
>> > > > > are incompatible with existing ones. It's not something I would
>> > > > > have
>> > > > > done personally, but it's been this way for a long time.
>> > > > >
>> > > >
>> > > > It may be defensible to keep np.bool as an alias for Python's
>> > > > bool
>> > > > even when we remove the other aliases.
>> > >
>> >
>> > I'd agree with that.
>> >
>> >
>> > > That is true, `int` is probably the most confusing, since it is not
>> > > at
>> > > all compatible to a Python integer, but rather the "default"
>> > > integer
>> > > (which happens to be the same as C `long` currently).
>> > >
>> > > So we could focus on `np.int`, `np.long`.  I am a bit unsure
>> > > whether
>> > > you would prefer that or are mainly pointing out the possibility?
>> > >
>> >
>> > Not sure what you mean with focus, focus on describing in the release
>> > notes? Deprecating `np.int` seems like the most beneficial part of
>> > this
>> > whole exercise.
>> >
>>
>> I meant limiting the current deprecation to `np.int`, maybe `np.long`,
>> and a "carefully chosen" set.
>> To be honest, I don't mind either way, so any stronger opinion will tip
>> the scale for me personally (my default currently is to update the
>> release notes to recommend the more descriptive names).
>>
>> There are probably more doc updates that would be nice, I will suggest
>> updating a separate issue for that.
>>
>>
>> > Right now, my main take-away from the discussion is that it would be
>> > > good to clarify the release notes a bit more.
>> > >
>> > > Using `float` for a dtype seems fine to me, but I prefer mentioning
>> > > `np.float64` over `np.float_`.
>> > > For integers, I wonder if we should also suggest `np.int64`, even –
>> > > or
>> > > because – if the default integer on many systems is currently
>> > > `np.int_`?
>> > >
>> >
>> > I agree. I think we should recommend sane, descriptive names that do
>> > the
>> > right thing. So ideally we'd have people spell their dtype specifiers
>> > as
>> >   dtype=bool  # or np.bool
>> >   dtype=np.float64
>> >   dtype=np.int64
>> >   dtype=np.complex128
>> > The names with underscores at the end make little sense from a UX
>> > perspective. And the C equivalents (single/double/etc) made sense 15
>> > years
>> > ago, but with the user base of today - the majority of whom will not
>> > know C
>> > fluently or at all - also don't make too much sense.
>> >
>> > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and
>> > 64
>> > bits is likely to be a pitfall much more often than it is what the
>> > user
>> > actually needs, so shouldn't be recommended and probably deserves a
>> > warning
>> > in the docs.
>>
>> Right, there is one slight trickery because `np.intp` is often a great
>> integer dtype to use, because it is the integer that NumPy uses for all
>> things related to indexing and array sizes.
>> (I would be happy to dig out my PR making `np.intp` the default NumPy
>> integer.)
>>
>> Cheers,
>>
>> Sebastian
>>
>>
>> >
>> > Cheers,
>> > Ralf
>> >
>> >
>> > >
>> > > >
>> > > > np.int_ and np.float_ have fixed precision, which makes them
>> > > > somewhat
>> > > > different from the builtin types. NumPy has a whole bunch of
>> > > > different
>> > > > precisions for integer and floats, so this distinction matters.
>> > > >
>> > > > In contrast, there is only one boolean dtype in NumPy, which
>> > > > matches
>> > > > Python's bool. So we wouldn't have to worry, for example, about
>> > > > whether a
>> > > > user has requested a specific precision explicitly. This comes up
>> > > > in
>> > > > issues
>> > > > like type-promotion where libraries like JAX and PyTorch have
>> > > > special
>> > > > case
>> > > > logic for most Python types vs NumPy dtypes (but booleans are the
>> > > > same for
>> > > > both):
>> > > > https://jax.readthedocs.io/en/latest/type_promotion.html
>> > >
>> > >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion