<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net">sebastian@sipsolutions.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:<br>
> On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <<a href="mailto:asmeurer@gmail.com" target="_blank">asmeurer@gmail.com</a>><br>
> wrote:<br>
> <br>
> > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg<br>
> > <<a href="mailto:sebastian@sipsolutions.net" target="_blank">sebastian@sipsolutions.net</a>> wrote:<br>
> > > <br>
> > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:<br>
> > > > Regarding np.bool specifically, if you want to deprecate this,<br>
> > > > you<br>
> > > > might want to discuss this with us at the array API standard<br>
> > > > <a href="https://github.com/data-apis/array-api" rel="noreferrer" target="_blank">https://github.com/data-apis/array-api</a> (which is currently in<br>
> > > > RFC<br>
> > > > stage). The spec uses bool as the name for the boolean dtype.<br>
> > > > <br>
> > > > Would it make sense for NumPy to change np.bool to just be the<br>
> > > > boolean<br>
> > > > dtype object? Unlike int and float, there is no ambiguity with<br>
> > > > bool,<br>
> > > > and NumPy clearly doesn't have any issues with shadowing<br>
> > > > builtin<br>
> > > > names<br>
> > > > in its namespace.<br>
> > > <br>
> > > We could keep the Python alias around (which for `dtype=` is the<br>
> > > same<br>
> > > as `np.bool_`).<br>
> > > <br>
> > > I am not sure I like the idea of immediately shadowing the<br>
> > > builtin.<br>
> > > That is a switch we can avoid flipping (without warning);<br>
> > > `np.bool_`<br>
> > > and `bool` are fairly different beasts? [1]<br>
> > <br>
> > NumPy already shadows a lot of builtins, in many cases, in ways<br>
> > that<br>
> > are incompatible with existing ones. It's not something I would<br>
> > have<br>
> > done personally, but it's been this way for a long time.<br>
> > <br>
> <br>
> It may be defensible to keep np.bool as an alias for Python's bool<br>
> even when we remove the other aliases.<br></blockquote><div><br></div><div>I'd agree with that.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
That is true, `int` is probably the most confusing, since it is not at<br>
all compatible to a Python integer, but rather the "default" integer<br>
(which happens to be the same as C `long` currently).<br>
<br>
So we could focus on `<a href="http://np.int" rel="noreferrer" target="_blank">np.int</a>`, `np.long`. I am a bit unsure whether<br>
you would prefer that or are mainly pointing out the possibility?<br></blockquote><div><br></div><div>Not sure what you mean with focus, focus on describing in the release notes? Deprecating `<a href="http://np.int">np.int</a>` seems like the most beneficial part of this whole exercise. <br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Right now, my main take-away from the discussion is that it would be<br>
good to clarify the release notes a bit more.<br>
<br>
Using `float` for a dtype seems fine to me, but I prefer mentioning<br>
`np.float64` over `np.float_`.<br>
For integers, I wonder if we should also suggest `np.int64`, even – or<br>
because – if the default integer on many systems is currently<br>
`np.int_`?<br></blockquote><div><br></div><div>I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as</div><div> dtype=bool # or np.bool</div><div> dtype=np.float64<br></div><div> dtype=np.int64</div><div> dtype=np.complex128</div><div>The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.</div><div><br></div><div>The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.<br></div><div><br></div><div>Cheers,<br></div><div>Ralf<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> <br>
> np.int_ and np.float_ have fixed precision, which makes them somewhat<br>
> different from the builtin types. NumPy has a whole bunch of<br>
> different<br>
> precisions for integer and floats, so this distinction matters.<br>
> <br>
> In contrast, there is only one boolean dtype in NumPy, which matches<br>
> Python's bool. So we wouldn't have to worry, for example, about<br>
> whether a<br>
> user has requested a specific precision explicitly. This comes up in<br>
> issues<br>
> like type-promotion where libraries like JAX and PyTorch have special<br>
> case<br>
> logic for most Python types vs NumPy dtypes (but booleans are the<br>
> same for<br>
> both):<br>
> <a href="https://jax.readthedocs.io/en/latest/type_promotion.html" rel="noreferrer" target="_blank">https://jax.readthedocs.io/en/latest/type_promotion.html</a><br><br>
</blockquote></div></div>