[Numpy-discussion] np.{bool,float,int} deprecation

Sebastian Berg sebastian at sipsolutions.net
Thu Dec 10 14:59:37 EST 2020


On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
> On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
> sebastian at sipsolutions.net>
> wrote:
> 
> > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <asmeurer at gmail.com>
> > > wrote:
> > > 
> > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
> > > > <sebastian at sipsolutions.net> wrote:
> > > > > 
> > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > > > > Regarding np.bool specifically, if you want to deprecate
> > > > > > this,
> > > > > > you
> > > > > > might want to discuss this with us at the array API
> > > > > > standard
> > > > > > https://github.com/data-apis/array-api (which is currently
> > > > > > in
> > > > > > RFC
> > > > > > stage). The spec uses bool as the name for the boolean
> > > > > > dtype.
> > > > > > 
> > > > > > Would it make sense for NumPy to change np.bool to just be
> > > > > > the
> > > > > > boolean
> > > > > > dtype object? Unlike int and float, there is no ambiguity
> > > > > > with
> > > > > > bool,
> > > > > > and NumPy clearly doesn't have any issues with shadowing
> > > > > > builtin
> > > > > > names
> > > > > > in its namespace.
> > > > > 
> > > > > We could keep the Python alias around (which for `dtype=` is
> > > > > the
> > > > > same
> > > > > as `np.bool_`).
> > > > > 
> > > > > I am not sure I like the idea of immediately shadowing the
> > > > > builtin.
> > > > > That is a switch we can avoid flipping (without warning);
> > > > > `np.bool_`
> > > > > and `bool` are fairly different beasts? [1]
> > > > 
> > > > NumPy already shadows a lot of builtins, in many cases, in ways
> > > > that
> > > > are incompatible with existing ones. It's not something I would
> > > > have
> > > > done personally, but it's been this way for a long time.
> > > > 
> > > 
> > > It may be defensible to keep np.bool as an alias for Python's
> > > bool
> > > even when we remove the other aliases.
> > 
> 
> I'd agree with that.
> 
> 
> > That is true, `int` is probably the most confusing, since it is not
> > at
> > all compatible to a Python integer, but rather the "default"
> > integer
> > (which happens to be the same as C `long` currently).
> > 
> > So we could focus on `np.int`, `np.long`.  I am a bit unsure
> > whether
> > you would prefer that or are mainly pointing out the possibility?
> > 
> 
> Not sure what you mean with focus, focus on describing in the release
> notes? Deprecating `np.int` seems like the most beneficial part of
> this
> whole exercise.
> 

I meant limiting the current deprecation to `np.int`, maybe `np.long`,
and a "carefully chosen" set.
To be honest, I don't mind either way, so any stronger opinion will tip
the scale for me personally (my default currently is to update the
release notes to recommend the more descriptive names).

There are probably more doc updates that would be nice, I will suggest
updating a separate issue for that.


> Right now, my main take-away from the discussion is that it would be
> > good to clarify the release notes a bit more.
> > 
> > Using `float` for a dtype seems fine to me, but I prefer mentioning
> > `np.float64` over `np.float_`.
> > For integers, I wonder if we should also suggest `np.int64`, even –
> > or
> > because – if the default integer on many systems is currently
> > `np.int_`?
> > 
> 
> I agree. I think we should recommend sane, descriptive names that do
> the
> right thing. So ideally we'd have people spell their dtype specifiers
> as
>   dtype=bool  # or np.bool
>   dtype=np.float64
>   dtype=np.int64
>   dtype=np.complex128
> The names with underscores at the end make little sense from a UX
> perspective. And the C equivalents (single/double/etc) made sense 15
> years
> ago, but with the user base of today - the majority of whom will not
> know C
> fluently or at all - also don't make too much sense.
> 
> The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and
> 64
> bits is likely to be a pitfall much more often than it is what the
> user
> actually needs, so shouldn't be recommended and probably deserves a
> warning
> in the docs.

Right, there is one slight trickery because `np.intp` is often a great
integer dtype to use, because it is the integer that NumPy uses for all
things related to indexing and array sizes.
(I would be happy to dig out my PR making `np.intp` the default NumPy
integer.)

Cheers,

Sebastian


> 
> Cheers,
> Ralf
> 
> 
> > 
> > > 
> > > np.int_ and np.float_ have fixed precision, which makes them
> > > somewhat
> > > different from the builtin types. NumPy has a whole bunch of
> > > different
> > > precisions for integer and floats, so this distinction matters.
> > > 
> > > In contrast, there is only one boolean dtype in NumPy, which
> > > matches
> > > Python's bool. So we wouldn't have to worry, for example, about
> > > whether a
> > > user has requested a specific precision explicitly. This comes up
> > > in
> > > issues
> > > like type-promotion where libraries like JAX and PyTorch have
> > > special
> > > case
> > > logic for most Python types vs NumPy dtypes (but booleans are the
> > > same for
> > > both):
> > > https://jax.readthedocs.io/en/latest/type_promotion.html
> > 
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201210/790f7cf4/attachment.sig>


More information about the NumPy-Discussion mailing list