<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 10, 2020 at 9:00 PM Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net">sebastian@sipsolutions.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:<br>
> On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <<br>
> <a href="mailto:sebastian@sipsolutions.net" target="_blank">sebastian@sipsolutions.net</a>><br>
> wrote:<br>
> <br>
> > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:<br>
> > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <<a href="mailto:asmeurer@gmail.com" target="_blank">asmeurer@gmail.com</a>><br>
> > > wrote:<br>
> > > <br>
> > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg<br>
> > > > <<a href="mailto:sebastian@sipsolutions.net" target="_blank">sebastian@sipsolutions.net</a>> wrote:<br>
> > > > > <br>
> > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:<br>
> > > > > > Regarding np.bool specifically, if you want to deprecate<br>
> > > > > > this,<br>
> > > > > > you<br>
> > > > > > might want to discuss this with us at the array API<br>
> > > > > > standard<br>
> > > > > > <a href="https://github.com/data-apis/array-api" rel="noreferrer" target="_blank">https://github.com/data-apis/array-api</a> (which is currently<br>
> > > > > > in<br>
> > > > > > RFC<br>
> > > > > > stage). The spec uses bool as the name for the boolean<br>
> > > > > > dtype.<br>
> > > > > > <br>
> > > > > > Would it make sense for NumPy to change np.bool to just be<br>
> > > > > > the<br>
> > > > > > boolean<br>
> > > > > > dtype object? Unlike int and float, there is no ambiguity<br>
> > > > > > with<br>
> > > > > > bool,<br>
> > > > > > and NumPy clearly doesn't have any issues with shadowing<br>
> > > > > > builtin<br>
> > > > > > names<br>
> > > > > > in its namespace.<br>
> > > > > <br>
> > > > > We could keep the Python alias around (which for `dtype=` is<br>
> > > > > the<br>
> > > > > same<br>
> > > > > as `np.bool_`).<br>
> > > > > <br>
> > > > > I am not sure I like the idea of immediately shadowing the<br>
> > > > > builtin.<br>
> > > > > That is a switch we can avoid flipping (without warning);<br>
> > > > > `np.bool_`<br>
> > > > > and `bool` are fairly different beasts? [1]<br>
> > > > <br>
> > > > NumPy already shadows a lot of builtins, in many cases, in ways<br>
> > > > that<br>
> > > > are incompatible with existing ones. It's not something I would<br>
> > > > have<br>
> > > > done personally, but it's been this way for a long time.<br>
> > > > <br>
> > > <br>
> > > It may be defensible to keep np.bool as an alias for Python's<br>
> > > bool<br>
> > > even when we remove the other aliases.<br>
> > <br>
> <br>
> I'd agree with that.<br>
> <br>
> <br>
> > That is true, `int` is probably the most confusing, since it is not<br>
> > at<br>
> > all compatible to a Python integer, but rather the "default"<br>
> > integer<br>
> > (which happens to be the same as C `long` currently).<br>
> > <br>
> > So we could focus on `<a href="http://np.int" rel="noreferrer" target="_blank">np.int</a>`, `np.long`. I am a bit unsure<br>
> > whether<br>
> > you would prefer that or are mainly pointing out the possibility?<br>
> > <br>
> <br>
> Not sure what you mean with focus, focus on describing in the release<br>
> notes? Deprecating `<a href="http://np.int" rel="noreferrer" target="_blank">np.int</a>` seems like the most beneficial part of<br>
> this<br>
> whole exercise.<br>
> <br>
<br>
I meant limiting the current deprecation to `<a href="http://np.int" rel="noreferrer" target="_blank">np.int</a>`, maybe `np.long`,<br>
and a "carefully chosen" set.<br></blockquote><div><br></div><div>Just deprecation `<a href="http://np.int">np.int</a>` may make sense. That will already raise awareness, and leaving `np.float` as-is may prevent a lot of churn. And we could then still deprecate `np.float` later. I also don't feel strongly about `float` either way though.<br></div><div><br></div><div>I'm not sure why you'd specifically touch `long`, it's not really relevant and it's not a builtin. <br></div><div><br></div><div>Cheers,<br></div><div>Ralf<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
To be honest, I don't mind either way, so any stronger opinion will tip<br>
the scale for me personally (my default currently is to update the<br>
release notes to recommend the more descriptive names).<br>
<br>
There are probably more doc updates that would be nice, I will suggest<br>
updating a separate issue for that.<br>
<br>
<br>
> Right now, my main take-away from the discussion is that it would be<br>
> > good to clarify the release notes a bit more.<br>
> > <br>
> > Using `float` for a dtype seems fine to me, but I prefer mentioning<br>
> > `np.float64` over `np.float_`.<br>
> > For integers, I wonder if we should also suggest `np.int64`, even –<br>
> > or<br>
> > because – if the default integer on many systems is currently<br>
> > `np.int_`?<br>
> > <br>
> <br>
> I agree. I think we should recommend sane, descriptive names that do<br>
> the<br>
> right thing. So ideally we'd have people spell their dtype specifiers<br>
> as<br>
> dtype=bool # or np.bool<br>
> dtype=np.float64<br>
> dtype=np.int64<br>
> dtype=np.complex128<br>
> The names with underscores at the end make little sense from a UX<br>
> perspective. And the C equivalents (single/double/etc) made sense 15<br>
> years<br>
> ago, but with the user base of today - the majority of whom will not<br>
> know C<br>
> fluently or at all - also don't make too much sense.<br>
> <br>
> The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and<br>
> 64<br>
> bits is likely to be a pitfall much more often than it is what the<br>
> user<br>
> actually needs, so shouldn't be recommended and probably deserves a<br>
> warning<br>
> in the docs.<br>
<br>
Right, there is one slight trickery because `np.intp` is often a great<br>
integer dtype to use, because it is the integer that NumPy uses for all<br>
things related to indexing and array sizes.<br>
(I would be happy to dig out my PR making `np.intp` the default NumPy<br>
integer.)<br>
<br>
Cheers,<br>
<br>
Sebastian<br>
<br>
<br>
> <br>
> Cheers,<br>
> Ralf<br>
> <br>
> <br>
> > <br>
> > > <br>
> > > np.int_ and np.float_ have fixed precision, which makes them<br>
> > > somewhat<br>
> > > different from the builtin types. NumPy has a whole bunch of<br>
> > > different<br>
> > > precisions for integer and floats, so this distinction matters.<br>
> > > <br>
> > > In contrast, there is only one boolean dtype in NumPy, which<br>
> > > matches<br>
> > > Python's bool. So we wouldn't have to worry, for example, about<br>
> > > whether a<br>
> > > user has requested a specific precision explicitly. This comes up<br>
> > > in<br>
> > > issues<br>
> > > like type-promotion where libraries like JAX and PyTorch have<br>
> > > special<br>
> > > case<br>
> > > logic for most Python types vs NumPy dtypes (but booleans are the<br>
> > > same for<br>
> > > both):<br>
> > > <a href="https://jax.readthedocs.io/en/latest/type_promotion.html" rel="noreferrer" target="_blank">https://jax.readthedocs.io/en/latest/type_promotion.html</a><br>
> > <br>
> > <br>
> _______________________________________________<br>
> NumPy-Discussion mailing list<br>
> <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>
<br>
_______________________________________________<br>
NumPy-Discussion mailing list<br>
<a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>
</blockquote></div></div>