[Numpy-discussion] np.{bool,float,int} deprecation

Sat Dec 12 14:25:26 EST 2020

On Sat, 2020-12-12 at 12:34 +1100, Juan Nunez-Iglesias wrote:
> 
> > I agree. I think we should recommend sane, descriptive names that
> > do the right thing. So ideally we'd have people spell their dtype
> > specifiers as
> >   dtype=bool  # or np.bool
> >   dtype=np.float64
> >   dtype=np.int64
> >   dtype=np.complex128
> > The names with underscores at the end make little sense from a UX
> > perspective. And the C equivalents (single/double/etc) made sense
> > 15 years ago, but with the user base of today - the majority of
> > whom will not know C fluently or at all - also don't make too much
> > sense.
> > 
> > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32
> > and 64 bits is likely to be a pitfall much more often than it is
> > what the user actually needs, so shouldn't be recommended and
> > probably deserves a warning in the docs.
> 
> I kinda disagree with this. I want to have a way to say, give me an
> array of the same type as the default NumPy type (for either ints or
> floats). This will prevent casting back and forth as different arrays
> are combined. In other words, as long as NumPy itself flips back and
> forth (depending on locale), I think users will in many cases want to
> flip back and forth with it?

But "default" in NumPy really doesn't mean a whole lot?  I can think of
three places where "defaults" exists:

1. `np.array([1])` will default to a C-long (as will `np.uint8(1) + 1`)

2. Sum and product upcast to C-long (and pretty much only those):

    np.sum(np.arange(10, dtype=np.int8))
    np.product(np.arange(10, dtype=np.int8))

3. NumPy uses `np.intp` for all indexing operations internally and
   some functions many functions which return integers related to
   indexing (e.g. `np.nonzero()`). [1]

The first two points have no logic at all besides: windows thinks long
is always 32bit and others think long is 64bit on 64bit systems. The
last point does have some logic.

Generally, the only reason to stick to a certain type would be that
mixing types can be slower (using a non `intp` to index or doing math
with a mix of 32bit and 64bit integers).
From a library perspective, I wonder how often you actually expect a
"default integer" input, as opposed to 32bit or 64bit depending on the
whims of the user; or `intp` because it is "indexing related".

It would be interesting to see if we can change the default at some
point. It might also be tricky: There may be quite a bit of code
expecting `long` (e.g. Cython extensions or `scipy.special` may or may
not notice such a change).

Cheers,

Sebastian

[1] intp is technically intptr_t in C, while indexing only requires an
ssize_t I think. That probably matters on no currently supported
systems, but system where it matters do exist (OpenVMS is one that just
came up, and we may support in the future).

> 
> Juan.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201212/ca2aad54/attachment.sig>