[Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

Wed Apr 8 15:37:08 EDT 2020

sorry to have fallen off the numpy grid for a bit, but:

On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote:
> > But, backward compatibility aside, could we have ONLY Scalars?
>

> Well, it is hard to write functions that work on N-Dimensions (where N
> can be 0), if the 0-D array does not exist. You can get away with
> scalars in most cases, because they pretend to be arrays in most cases
> (aside from mutability).
>

> But I am pretty sure we have a bunch of cases that need
> `res = np.asarray(res)` simply because `res` is N-D but could then be
> silently converted to a scalar. E.g. see
> https://github.com/numpy/numpy/issues/13105 for an issue about this
> (although it does not actually list any specific problems).
>

I'm not sure this is insolvable (again, backwards compatibility aside) --
after all, one of the key issues is that it's undetermined what the rank
should be of: array(a_scalar) -- 0-d is the only unambiguous answer, but
then it's not really an array in the usual sense anyway. So in theory, we
could not allow that conversion without specifying a rank.

at the end of the day, there has to be some endpoint on how far you can
reduce the rank of an array and have it work -- why not have 1 be the lower
limit?

-CHB

> - Sebastian
>
>
> > There is certainly a need for more numpy-like scalars: more than the
> > built
> > in data types, and some handy attributes and methods, like dtype,
> > .itemsize, etc. But could we make an enhanced scalar that had
> > everything we
> > actually need from a zero-d array?
> >
> > The key point would be mutability -- but do we really need mutable
> > scalars?
> > I can't think of any time I've needed that, when I couldn't have used
> > a 1-d
> > array of length 1.
> >
> > Is there a use case for zero-d arrays that could not be met with an
> > enhanced scalar?
> >
> > -CHB
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> > allanhaldane at gmail.com>
> > wrote:
> >
> > > I have some thoughts on scalars from playing with ndarray ducktypes
> > > (__array_function__), eg a MaskedArray ndarray-ducktype, for which
> > > I
> > > wanted an associated "MaskedScalar" type.
> > >
> > > In summary, the ways scalars currently work makes ducktyping
> > > (duck-scalars) difficult:
> > >
> > >   * numpy scalar types are not subclassable, so my duck-scalars
> > > aren't
> > >     subclasses of numpy scalars and aren't in the type hierarchy
> > >   * even if scalars were subclassable, I would have to subclass
> > > each
> > >     scalar datatype individually to make masked versions
> > >   * lots of code checks  `np.isinstance(var, np.float64)` which
> > > breaks
> > >     for my duck-scalars
> > >   * it was difficult to distinguish between a duck-scalar and a
> > > duck-0d
> > >     array. The method I used in the end seems hacky.
> > >
> > > This has led to some daydreams about how scalars should work, and
> > > also
> > > led me last to read through your NEPs 40/41 with specific focus on
> > > what
> > > you said about scalars, and was about to post there until I saw
> > > this
> > > discussion. I agree with what you said in the NEPs about not making
> > > scalars be dtype instances.
> > >
> > > Here is what ducktypes led me to:
> > >
> > > If we are able to do something like define a `np.numpy_scalar` type
> > > covering all numpy scalars, which has a `.dtype` attribute like you
> > > describe in the NEPs, then that would seem to solve the ducktype
> > > problems above. Ducktype implementors would need to make a "duck-
> > > scalar"
> > > type in parallel to their "duck-ndarray" type, but I found that to
> > > be
> > > pretty easy using an abstract class in my MaskedArray ducktype,
> > > since
> > > the MaskedArray and MaskedScalar share a lot of behavior.
> > >
> > > A numpy_scalar type would also help solve some object-array
> > > problems if
> > > the object scalars are wrapped in the np_scalar type. A long time
> > > ago I
> > > started to try to fix up various funny/strange behaviors of object
> > > datatypes, but there are lots of special cases, and the main
> > > problem was
> > > that the returned objects (eg from indexing) were not numpy types
> > > and
> > > did not support numpy attributes or indexing. Wrapping the returned
> > > object in `np.numpy_scalar` might add an extra slight annoyance to
> > > people who want to unwrap the object, but I think it would make
> > > object
> > > arrays less buggy and make code using object arrays easier to
> > > reason
> > > about and debug.
> > >
> > > Finally, a few random votes/comments based on the other emails on
> > > the list:
> > >
> > > I think scalars have a place in numpy (rather than just reusing 0d
> > > arrays), since there is a clear use in having hashable, immutable
> > > scalars. Structured scalars should probably be immutable.
> > >
> > > I agree with your suggestion that scalars should not be indexable.
> > > Thus,
> > > my duck-scalars (and proposed numpy_scalar) would not be indexable.
> > > However, I think they should encode their datatype though a .dtype
> > > attribute like ndarrays, rather than by inheritance.
> > >
> > > Also, something to think about is that currently numpy scalars
> > > satisfy
> > > the property `isinstance(np.float64(1), float)`, i.e they are
> > > within the
> > > python numerical type hierarchy. 0d arrays do not have this
> > > property. My
> > > proposal above would break this. I'm not sure what to think about
> > > whether this is a good property to maintain or not.
> > >
> > > Cheers,
> > > Allan
> > >
> > >
> > >
> > > On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > > > Hi all,
> > > >
> > > > When we create new datatypes, we have the option to make new
> > > > choices
> > > > for the new datatypes [0] (not the existing ones).
> > > >
> > > > The question is: Should every NumPy datatype have a scalar
> > > > associated
> > > > and should operations like indexing return a scalar or a 0-D
> > > > array?
> > > >
> > > > This is in my opinion a complex, almost philosophical, question,
> > > > and we
> > > > do not have to settle anything for a long time. But, if we do not
> > > > decide a direction before we have many new datatypes the decision
> > > > will
> > > > make itself...
> > > > So happy about any ideas, even if its just a gut feeling :).
> > > >
> > > > There are various points. I would like to mostly ignore the
> > > > technical
> > > > ones, but I am listing them anyway here:
> > > >
> > > >   * Scalars are faster (although that can be optimized likely)
> > > >
> > > >   * Scalars have a lower memory footprint
> > > >
> > > >   * The current implementation incurs a technical debt in NumPy.
> > > >     (I do not think that is a general issue, though. We could
> > > >     automatically create scalars for each new datatype probably.)
> > > >
> > > > Advantages of having no scalars:
> > > >
> > > >   * No need to keep track of scalars to preserve them in ufuncs,
> > > > or
> > > >     libraries using `np.asarray`, do they need
> > > > `np.asarray_or_scalar`?
> > > >     (or decide they return always arrays, although ufuncs may
> > > > not)
> > > >
> > > >   * Seems simpler in many ways, you always know the output will
> > > > be an
> > > >     array if it has to do with NumPy.
> > > >
> > > > Advantages of having scalars:
> > > >
> > > >   * Scalars are immutable and we are used to them from Python.
> > > >     A 0-D array cannot be used as a dictionary key consistently
> > > > [1].
> > > >
> > > >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
> > > >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > > > defined,
> > > >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > > > [2]
> > > >
> > > >   * Object arrays as we have them now make sense, `arr1d[0]` can
> > > >     reasonably return a Python object. I.e. arrays feel more like
> > > >     container if you can take elements out easily.
> > > >
> > > > Could go both ways:
> > > >
> > > >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the
> > > > array
> > > >     without scalars. With scalars `arr1d[0, ...]` clarifies the
> > > >     meaning. (In principle it is good to never use `arr2d[0]` to
> > > >     get a 1D slice, probably more-so if scalars exist.)
> > > >
> > > > Note: array-scalars (the current NumPy scalars) are not useful in
> > > > my
> > > > opinion [3]. A scalar should not be indexed or have a shape. I do
> > > > not
> > > > believe in scalars pretending to be arrays.
> > > >
> > > > I personally tend towards liking scalars.  If Python was a
> > > > language
> > > > where the array (array-programming) concept was ingrained into
> > > > the
> > > > language itself, I would lean the other way. But users are used
> > > > to
> > > > scalars, and they "put" scalars into arrays. Array objects are in
> > > > some
> > > > ways strange in Python, and I feel not having scalars detaches
> > > > them
> > > > further.
> > > >
> > > > Having scalars, however also means we should preserve them. I
> > > > feel in
> > > > principle that is actually fairly straight forward. E.g. for
> > > > ufuncs:
> > > >
> > > >    * np.add(scalar, scalar) -> scalar
> > > >    * np.add.reduce(arr, axis=None) -> scalar
> > > >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> > > >    * np.add.reduce(scalar, axis=()) -> array
> > > >
> > > > Of course libraries that do `np.asarray` would/could basically
> > > > chose to
> > > > not preserve scalars: Their signature is defined as taking
> > > > strictly
> > > > array input.
> > > >
> > > > Cheers,
> > > >
> > > > Sebastian
> > > >
> > > >
> > > > [0] At best this can be a vision to decide which way they may
> > > > evolve.
> > > >
> > > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > > > arguably
> > > > strange. E.g. Quantity defines hash correctly, but does not fully
> > > > ensure immutability for 0-D Quantities. Ensuring immutability in
> > > > a
> > > > world where "views" are a central concept requires a write-only
> > > > copy.
> > > >
> > > > [2] Arguably `.item()` would always return a scalar, but it would
> > > > be a
> > > > second class citizen. (Although if it returns a scalar, at least
> > > > we
> > > > already have a scalar implementation.)
> > > >
> > > > [3] They are necessary due to technical debt for NumPy datatypes
> > > > though.
> > > >
> > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200408/334683e8/attachment-0001.html>