<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 13, 2020 at 7:29 PM Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net">sebastian@sipsolutions.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Sun, 2020-12-13 at 19:00 +1100, Juan Nunez-Iglesias wrote:<br>

> <br>

> <br>

> > On 13 Dec 2020, at 6:25 am, Sebastian Berg <<br>

> > <a href="mailto:sebastian@sipsolutions.net" target="_blank">sebastian@sipsolutions.net</a>> wrote:<br>

> > <br>

> > But "default" in NumPy really doesn't mean a whole lot?  I can<br>

> > think of<br>

> > three places where "defaults" exists:<br>

> <br>

> Huh? There are platform-specific defaults for literally every array<br>

> creation function in NumPy?<br>

> <br>

> In [1]: np.array([4, 9]).dtype<br>

> Out[1]: dtype('int64')<br>

<snip><br>

> The list goes on…<br>

> <br>

<br>

I should have been more clear about this and my opinion on it:<br>

<br>

1. The whole list comes down to my point 1: when confronted with a<br>

Python integer, NumPy will typically use a C-long [1].<br>

Additionally, `dtype=int` is always the same as long:<br>

`np.dtype(int) == np.dtype("long")`.<br>

<br>

The reason why I see that as a single point, is that it is defined in a<br>

single place in C [1].  (The `np.dtype(int)` is a second place.)<br>

<br>

<br>

2. I agree with Ralf that this is "random". On the same computer you<br>

can easily get a wrong result for the identical code because you boot<br>

into windows instead of linux [2]. `long` is not a good default! It is<br>

32bit on windows and 64bit on (64bit) linux! That should confuse the<br>

majority of our users (and probably many who are aware of C integer<br>

types).<br>

Good defaults are awesome, but I just can't see how `long` is a good<br>

default.  There were good reasons for it on Python 2, but that is not<br>

relevant anymore.<br>

<br>

<br>

3. I think that `intp` would be a much saner default for most code. It<br>

gives a system dependent result, but two points are in its favor:<br>

<br>

   * NumPy generates `intp` in quite a lot of places<br>

   * It is always safe (and fast) to index arrays with `intp`<br>

<br>

<br>

> And, indeed, mixing types can cause implicit casting, and thus both<br>

> slowness and unexpected type promotion, which brings with it its own<br>

> bugs… Again, I think it is valuable to have syntax to express<br>

> `np.zeros(…, dtype=<whatever-dtype-np.array(…)-would-give-for-my-<br>

> data>)`.<br>

<br>

Yes, it is valuable, but I am unsure we should advise to use it...<br></blockquote><div><br></div><div>Agreed, it should be possible for people who know that's what they want, but an "always int64" default would be way better. Before we had 32-bit CI, I developed on 32-bit Linux on purpose, and found multiple newly-introduced bugs in NumPy and Scipy each release cycle. Risking correctness issues like overflows is far worse than possible sub-optimal performance.</div><div><br></div><div>For that same reason, float96/float128 are very annoying. Users don't realize that those aren't portable.</div><div><br></div><div>Cheers,<br></div><div>Ralf</div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Cheers,<br>

<br>

Sebastian<br>

<br>

<br>

<br>

[1] Currently defined here:<br>

<a href="https://github.com/numpy/numpy/blob/7a42940e610b77cee2f98eb88aed5e66ef6d8c2a/numpy/core/src/multiarray/abstractdtypes.c#L16-L45" rel="noreferrer" target="_blank">https://github.com/numpy/numpy/blob/7a42940e610b77cee2f98eb88aed5e66ef6d8c2a/numpy/core/src/multiarray/abstractdtypes.c#L16-L45</a><br>

Which will use `long` normally, but `long long` (64bit) if that fails<br>

and even `unsigned long long` if *that* fails also.<br>

<br>

<br>

[2] I would not be surprised if there are quite a few libraries with<br>

bugs for very large arrays, that are simply not found yet, because<br>

nobody tried to run the code on very large arrays on a windows<br>

workstation yet.<br>

<br>

<br>

<br>

> Juan.<br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

<br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

</blockquote></div></div>