[Numpy-discussion] np.{bool,float,int} deprecation

Sun Dec 13 15:31:58 EST 2020

On Sun, Dec 13, 2020 at 7:29 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Sun, 2020-12-13 at 19:00 +1100, Juan Nunez-Iglesias wrote:
> >
> >
> > > On 13 Dec 2020, at 6:25 am, Sebastian Berg <
> > > sebastian at sipsolutions.net> wrote:
> > >
> > > But "default" in NumPy really doesn't mean a whole lot?  I can
> > > think of
> > > three places where "defaults" exists:
> >
> > Huh? There are platform-specific defaults for literally every array
> > creation function in NumPy?
> >
> > In [1]: np.array([4, 9]).dtype
> > Out[1]: dtype('int64')
> <snip>
> > The list goes on…
> >
>
> I should have been more clear about this and my opinion on it:
>
> 1. The whole list comes down to my point 1: when confronted with a
> Python integer, NumPy will typically use a C-long [1].
> Additionally, `dtype=int` is always the same as long:
> `np.dtype(int) == np.dtype("long")`.
>
> The reason why I see that as a single point, is that it is defined in a
> single place in C [1].  (The `np.dtype(int)` is a second place.)
>
>
> 2. I agree with Ralf that this is "random". On the same computer you
> can easily get a wrong result for the identical code because you boot
> into windows instead of linux [2]. `long` is not a good default! It is
> 32bit on windows and 64bit on (64bit) linux! That should confuse the
> majority of our users (and probably many who are aware of C integer
> types).
> Good defaults are awesome, but I just can't see how `long` is a good
> default.  There were good reasons for it on Python 2, but that is not
> relevant anymore.
>
>
> 3. I think that `intp` would be a much saner default for most code. It
> gives a system dependent result, but two points are in its favor:
>
>    * NumPy generates `intp` in quite a lot of places
>    * It is always safe (and fast) to index arrays with `intp`
>
>
> > And, indeed, mixing types can cause implicit casting, and thus both
> > slowness and unexpected type promotion, which brings with it its own
> > bugs… Again, I think it is valuable to have syntax to express
> > `np.zeros(…, dtype=<whatever-dtype-np.array(…)-would-give-for-my-
> > data>)`.
>
> Yes, it is valuable, but I am unsure we should advise to use it...
>

Agreed, it should be possible for people who know that's what they want,
but an "always int64" default would be way better. Before we had 32-bit CI,
I developed on 32-bit Linux on purpose, and found multiple newly-introduced
bugs in NumPy and Scipy each release cycle. Risking correctness issues like
overflows is far worse than possible sub-optimal performance.

For that same reason, float96/float128 are very annoying. Users don't
realize that those aren't portable.

Cheers,
Ralf

> Cheers,
>
> Sebastian
>
>
>
> [1] Currently defined here:
>
> https://github.com/numpy/numpy/blob/7a42940e610b77cee2f98eb88aed5e66ef6d8c2a/numpy/core/src/multiarray/abstractdtypes.c#L16-L45
> Which will use `long` normally, but `long long` (64bit) if that fails
> and even `unsigned long long` if *that* fails also.
>
>
> [2] I would not be surprised if there are quite a few libraries with
> bugs for very large arrays, that are simply not found yet, because
> nobody tried to run the code on very large arrays on a windows
> workstation yet.
>
>
>
> > Juan.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201213/55a94c5d/attachment.html>