[Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?
Sebastian Berg
sebastian at sipsolutions.net
Mon Mar 23 16:30:56 EDT 2020
On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote:
> I've always found the duality of zero-d arrays an scalars confusing,
> and
> I'm sure I'm not alone.
>
> Having both is just plain weird.
I guess so, it is a tricky situation, and I do not really have an
answer.
>
> But, backward compatibility aside, could we have ONLY Scalars?
>
> When we index into an array, the dimensionality is reduced by one, so
> indexing into a 1D array has to get us something: but the zero-d
> array is a
> really weird object -- do we really need it?
>
Well, it is hard to write functions that work on N-Dimensions (where N
can be 0), if the 0-D array does not exist. You can get away with
scalars in most cases, because they pretend to be arrays in most cases
(aside from mutability).
But I am pretty sure we have a bunch of cases that need
`res = np.asarray(res)` simply because `res` is N-D but could then be
silently converted to a scalar. E.g. see
https://github.com/numpy/numpy/issues/13105 for an issue about this
(although it does not actually list any specific problems).
- Sebastian
> There is certainly a need for more numpy-like scalars: more than the
> built
> in data types, and some handy attributes and methods, like dtype,
> .itemsize, etc. But could we make an enhanced scalar that had
> everything we
> actually need from a zero-d array?
>
> The key point would be mutability -- but do we really need mutable
> scalars?
> I can't think of any time I've needed that, when I couldn't have used
> a 1-d
> array of length 1.
>
> Is there a use case for zero-d arrays that could not be met with an
> enhanced scalar?
>
> -CHB
>
>
>
>
>
>
>
> On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> allanhaldane at gmail.com>
> wrote:
>
> > I have some thoughts on scalars from playing with ndarray ducktypes
> > (__array_function__), eg a MaskedArray ndarray-ducktype, for which
> > I
> > wanted an associated "MaskedScalar" type.
> >
> > In summary, the ways scalars currently work makes ducktyping
> > (duck-scalars) difficult:
> >
> > * numpy scalar types are not subclassable, so my duck-scalars
> > aren't
> > subclasses of numpy scalars and aren't in the type hierarchy
> > * even if scalars were subclassable, I would have to subclass
> > each
> > scalar datatype individually to make masked versions
> > * lots of code checks `np.isinstance(var, np.float64)` which
> > breaks
> > for my duck-scalars
> > * it was difficult to distinguish between a duck-scalar and a
> > duck-0d
> > array. The method I used in the end seems hacky.
> >
> > This has led to some daydreams about how scalars should work, and
> > also
> > led me last to read through your NEPs 40/41 with specific focus on
> > what
> > you said about scalars, and was about to post there until I saw
> > this
> > discussion. I agree with what you said in the NEPs about not making
> > scalars be dtype instances.
> >
> > Here is what ducktypes led me to:
> >
> > If we are able to do something like define a `np.numpy_scalar` type
> > covering all numpy scalars, which has a `.dtype` attribute like you
> > describe in the NEPs, then that would seem to solve the ducktype
> > problems above. Ducktype implementors would need to make a "duck-
> > scalar"
> > type in parallel to their "duck-ndarray" type, but I found that to
> > be
> > pretty easy using an abstract class in my MaskedArray ducktype,
> > since
> > the MaskedArray and MaskedScalar share a lot of behavior.
> >
> > A numpy_scalar type would also help solve some object-array
> > problems if
> > the object scalars are wrapped in the np_scalar type. A long time
> > ago I
> > started to try to fix up various funny/strange behaviors of object
> > datatypes, but there are lots of special cases, and the main
> > problem was
> > that the returned objects (eg from indexing) were not numpy types
> > and
> > did not support numpy attributes or indexing. Wrapping the returned
> > object in `np.numpy_scalar` might add an extra slight annoyance to
> > people who want to unwrap the object, but I think it would make
> > object
> > arrays less buggy and make code using object arrays easier to
> > reason
> > about and debug.
> >
> > Finally, a few random votes/comments based on the other emails on
> > the list:
> >
> > I think scalars have a place in numpy (rather than just reusing 0d
> > arrays), since there is a clear use in having hashable, immutable
> > scalars. Structured scalars should probably be immutable.
> >
> > I agree with your suggestion that scalars should not be indexable.
> > Thus,
> > my duck-scalars (and proposed numpy_scalar) would not be indexable.
> > However, I think they should encode their datatype though a .dtype
> > attribute like ndarrays, rather than by inheritance.
> >
> > Also, something to think about is that currently numpy scalars
> > satisfy
> > the property `isinstance(np.float64(1), float)`, i.e they are
> > within the
> > python numerical type hierarchy. 0d arrays do not have this
> > property. My
> > proposal above would break this. I'm not sure what to think about
> > whether this is a good property to maintain or not.
> >
> > Cheers,
> > Allan
> >
> >
> >
> > On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > > Hi all,
> > >
> > > When we create new datatypes, we have the option to make new
> > > choices
> > > for the new datatypes [0] (not the existing ones).
> > >
> > > The question is: Should every NumPy datatype have a scalar
> > > associated
> > > and should operations like indexing return a scalar or a 0-D
> > > array?
> > >
> > > This is in my opinion a complex, almost philosophical, question,
> > > and we
> > > do not have to settle anything for a long time. But, if we do not
> > > decide a direction before we have many new datatypes the decision
> > > will
> > > make itself...
> > > So happy about any ideas, even if its just a gut feeling :).
> > >
> > > There are various points. I would like to mostly ignore the
> > > technical
> > > ones, but I am listing them anyway here:
> > >
> > > * Scalars are faster (although that can be optimized likely)
> > >
> > > * Scalars have a lower memory footprint
> > >
> > > * The current implementation incurs a technical debt in NumPy.
> > > (I do not think that is a general issue, though. We could
> > > automatically create scalars for each new datatype probably.)
> > >
> > > Advantages of having no scalars:
> > >
> > > * No need to keep track of scalars to preserve them in ufuncs,
> > > or
> > > libraries using `np.asarray`, do they need
> > > `np.asarray_or_scalar`?
> > > (or decide they return always arrays, although ufuncs may
> > > not)
> > >
> > > * Seems simpler in many ways, you always know the output will
> > > be an
> > > array if it has to do with NumPy.
> > >
> > > Advantages of having scalars:
> > >
> > > * Scalars are immutable and we are used to them from Python.
> > > A 0-D array cannot be used as a dictionary key consistently
> > > [1].
> > >
> > > I.e. without scalars as first class citizen `dict[arr1d[0]]`
> > > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > > defined,
> > > and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > > [2]
> > >
> > > * Object arrays as we have them now make sense, `arr1d[0]` can
> > > reasonably return a Python object. I.e. arrays feel more like
> > > container if you can take elements out easily.
> > >
> > > Could go both ways:
> > >
> > > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the
> > > array
> > > without scalars. With scalars `arr1d[0, ...]` clarifies the
> > > meaning. (In principle it is good to never use `arr2d[0]` to
> > > get a 1D slice, probably more-so if scalars exist.)
> > >
> > > Note: array-scalars (the current NumPy scalars) are not useful in
> > > my
> > > opinion [3]. A scalar should not be indexed or have a shape. I do
> > > not
> > > believe in scalars pretending to be arrays.
> > >
> > > I personally tend towards liking scalars. If Python was a
> > > language
> > > where the array (array-programming) concept was ingrained into
> > > the
> > > language itself, I would lean the other way. But users are used
> > > to
> > > scalars, and they "put" scalars into arrays. Array objects are in
> > > some
> > > ways strange in Python, and I feel not having scalars detaches
> > > them
> > > further.
> > >
> > > Having scalars, however also means we should preserve them. I
> > > feel in
> > > principle that is actually fairly straight forward. E.g. for
> > > ufuncs:
> > >
> > > * np.add(scalar, scalar) -> scalar
> > > * np.add.reduce(arr, axis=None) -> scalar
> > > * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> > > * np.add.reduce(scalar, axis=()) -> array
> > >
> > > Of course libraries that do `np.asarray` would/could basically
> > > chose to
> > > not preserve scalars: Their signature is defined as taking
> > > strictly
> > > array input.
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > > [0] At best this can be a vision to decide which way they may
> > > evolve.
> > >
> > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > > arguably
> > > strange. E.g. Quantity defines hash correctly, but does not fully
> > > ensure immutability for 0-D Quantities. Ensuring immutability in
> > > a
> > > world where "views" are a central concept requires a write-only
> > > copy.
> > >
> > > [2] Arguably `.item()` would always return a scalar, but it would
> > > be a
> > > second class citizen. (Although if it returns a scalar, at least
> > > we
> > > already have a scalar implementation.)
> > >
> > > [3] They are necessary due to technical debt for NumPy datatypes
> > > though.
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200323/d377a251/attachment.sig>
More information about the NumPy-Discussion
mailing list