<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net">sebastian@sipsolutions.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:<br>

> On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <<a href="mailto:asmeurer@gmail.com" target="_blank">asmeurer@gmail.com</a>><br>

> wrote:<br>

> <br>

> > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg<br>

> > <<a href="mailto:sebastian@sipsolutions.net" target="_blank">sebastian@sipsolutions.net</a>> wrote:<br>

> > > <br>

> > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:<br>

> > > > Regarding np.bool specifically, if you want to deprecate this,<br>

> > > > you<br>

> > > > might want to discuss this with us at the array API standard<br>

> > > > <a href="https://github.com/data-apis/array-api" rel="noreferrer" target="_blank">https://github.com/data-apis/array-api</a> (which is currently in<br>

> > > > RFC<br>

> > > > stage). The spec uses bool as the name for the boolean dtype.<br>

> > > > <br>

> > > > Would it make sense for NumPy to change np.bool to just be the<br>

> > > > boolean<br>

> > > > dtype object? Unlike int and float, there is no ambiguity with<br>

> > > > bool,<br>

> > > > and NumPy clearly doesn't have any issues with shadowing<br>

> > > > builtin<br>

> > > > names<br>

> > > > in its namespace.<br>

> > > <br>

> > > We could keep the Python alias around (which for `dtype=` is the<br>

> > > same<br>

> > > as `np.bool_`).<br>

> > > <br>

> > > I am not sure I like the idea of immediately shadowing the<br>

> > > builtin.<br>

> > > That is a switch we can avoid flipping (without warning);<br>

> > > `np.bool_`<br>

> > > and `bool` are fairly different beasts? [1]<br>

> > <br>

> > NumPy already shadows a lot of builtins, in many cases, in ways<br>

> > that<br>

> > are incompatible with existing ones. It's not something I would<br>

> > have<br>

> > done personally, but it's been this way for a long time.<br>

> > <br>

> <br>

> It may be defensible to keep np.bool as an alias for Python's bool<br>

> even when we remove the other aliases.<br></blockquote><div><br></div><div>I'd agree with that.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

That is true, `int` is probably the most confusing, since it is not at<br>

all compatible to a Python integer, but rather the "default" integer<br>

(which happens to be the same as C `long` currently).<br>

<br>

So we could focus on `<a href="http://np.int" rel="noreferrer" target="_blank">np.int</a>`, `np.long`.  I am a bit unsure whether<br>

you would prefer that or are mainly pointing out the possibility?<br></blockquote><div><br></div><div>Not sure what you mean with focus, focus on describing in the release notes? Deprecating `<a href="http://np.int">np.int</a>` seems like the most beneficial part of this whole exercise. <br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Right now, my main take-away from the discussion is that it would be<br>

good to clarify the release notes a bit more.<br>

<br>

Using `float` for a dtype seems fine to me, but I prefer mentioning<br>

`np.float64` over `np.float_`.<br>

For integers, I wonder if we should also suggest `np.int64`, even – or<br>

because – if the default integer on many systems is currently<br>

`np.int_`?<br></blockquote><div><br></div><div>I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as</div><div>  dtype=bool  # or np.bool</div><div>  dtype=np.float64<br></div><div>  dtype=np.int64</div><div>  dtype=np.complex128</div><div>The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense.</div><div><br></div><div>The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs.<br></div><div><br></div><div>Cheers,<br></div><div>Ralf<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> <br>

> np.int_ and np.float_ have fixed precision, which makes them somewhat<br>

> different from the builtin types. NumPy has a whole bunch of<br>

> different<br>

> precisions for integer and floats, so this distinction matters.<br>

> <br>

> In contrast, there is only one boolean dtype in NumPy, which matches<br>

> Python's bool. So we wouldn't have to worry, for example, about<br>

> whether a<br>

> user has requested a specific precision explicitly. This comes up in<br>

> issues<br>

> like type-promotion where libraries like JAX and PyTorch have special<br>

> case<br>

> logic for most Python types vs NumPy dtypes (but booleans are the<br>

> same for<br>

> both):<br>

> <a href="https://jax.readthedocs.io/en/latest/type_promotion.html" rel="noreferrer" target="_blank">https://jax.readthedocs.io/en/latest/type_promotion.html</a><br><br>

</blockquote></div></div>