[Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

Sebastian Berg sebastian at sipsolutions.net
Wed Apr 8 16:16:13 EDT 2020


On Wed, 2020-04-08 at 12:37 -0700, Chris Barker wrote:
> sorry to have fallen off the numpy grid for a bit, but:
> 
> On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg <
> sebastian at sipsolutions.net>
> wrote:
> 
> > On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote:
> > > But, backward compatibility aside, could we have ONLY Scalars?
> > Well, it is hard to write functions that work on N-Dimensions
> > (where N
> > can be 0), if the 0-D array does not exist. You can get away with
> > scalars in most cases, because they pretend to be arrays in most
> > cases
> > (aside from mutability).
> > 
> > But I am pretty sure we have a bunch of cases that need
> > `res = np.asarray(res)` simply because `res` is N-D but could then
> > be
> > silently converted to a scalar. E.g. see
> > https://github.com/numpy/numpy/issues/13105 for an issue about this
> > (although it does not actually list any specific problems).
> > 
> 
> I'm not sure this is insolvable (again, backwards compatibility
> aside) --
> after all, one of the key issues is that it's undetermined what the
> rank
> should be of: array(a_scalar) -- 0-d is the only unambiguous answer,
> but
> then it's not really an array in the usual sense anyway. So in
> theory, we
> could not allow that conversion without specifying a rank.

So as a (silly) example, the following does not generalize to 0d, even
though it should:

def weird_normalize_by_trace_inplace(stacked_matrices)
    """Devides matrices by their trace but retains sign
    (works in-place, and thus e.g. not for integer arrays)

    Parameters
    ----------
    stacked_matrices : (..., N, M) ndarray
    """
    assert stacked_matrices.shape[-1] == stacked_matrices.shape[-2]

    trace = np.trace(stacked_matrices, axis1=-2, axis2=-1)
    trace[trace < 0] *= -1
    stacked_matrices /= trace

Sure that function does not make sense and you could rewrite it, but
the fact is that in that function you want to conditionally modify
trace in-place, but trace can be 0d and the "conditional" modification
breaks down.

- Sebastian


> 
> at the end of the day, there has to be some endpoint on how far you
> can
> reduce the rank of an array and have it work -- why not have 1 be the
> lower
> limit?
> 
> -CHB
> 
> 
> 
> 
> 
> 
> 
> > - Sebastian
> > 
> > 
> > > There is certainly a need for more numpy-like scalars: more than
> > > the
> > > built
> > > in data types, and some handy attributes and methods, like dtype,
> > > .itemsize, etc. But could we make an enhanced scalar that had
> > > everything we
> > > actually need from a zero-d array?
> > > 
> > > The key point would be mutability -- but do we really need
> > > mutable
> > > scalars?
> > > I can't think of any time I've needed that, when I couldn't have
> > > used
> > > a 1-d
> > > array of length 1.
> > > 
> > > Is there a use case for zero-d arrays that could not be met with
> > > an
> > > enhanced scalar?
> > > 
> > > -CHB
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> > > allanhaldane at gmail.com>
> > > wrote:
> > > 
> > > > I have some thoughts on scalars from playing with ndarray
> > > > ducktypes
> > > > (__array_function__), eg a MaskedArray ndarray-ducktype, for
> > > > which
> > > > I
> > > > wanted an associated "MaskedScalar" type.
> > > > 
> > > > In summary, the ways scalars currently work makes ducktyping
> > > > (duck-scalars) difficult:
> > > > 
> > > >   * numpy scalar types are not subclassable, so my duck-scalars
> > > > aren't
> > > >     subclasses of numpy scalars and aren't in the type
> > > > hierarchy
> > > >   * even if scalars were subclassable, I would have to subclass
> > > > each
> > > >     scalar datatype individually to make masked versions
> > > >   * lots of code checks  `np.isinstance(var, np.float64)` which
> > > > breaks
> > > >     for my duck-scalars
> > > >   * it was difficult to distinguish between a duck-scalar and a
> > > > duck-0d
> > > >     array. The method I used in the end seems hacky.
> > > > 
> > > > This has led to some daydreams about how scalars should work,
> > > > and
> > > > also
> > > > led me last to read through your NEPs 40/41 with specific focus
> > > > on
> > > > what
> > > > you said about scalars, and was about to post there until I saw
> > > > this
> > > > discussion. I agree with what you said in the NEPs about not
> > > > making
> > > > scalars be dtype instances.
> > > > 
> > > > Here is what ducktypes led me to:
> > > > 
> > > > If we are able to do something like define a `np.numpy_scalar`
> > > > type
> > > > covering all numpy scalars, which has a `.dtype` attribute like
> > > > you
> > > > describe in the NEPs, then that would seem to solve the
> > > > ducktype
> > > > problems above. Ducktype implementors would need to make a
> > > > "duck-
> > > > scalar"
> > > > type in parallel to their "duck-ndarray" type, but I found that
> > > > to
> > > > be
> > > > pretty easy using an abstract class in my MaskedArray ducktype,
> > > > since
> > > > the MaskedArray and MaskedScalar share a lot of behavior.
> > > > 
> > > > A numpy_scalar type would also help solve some object-array
> > > > problems if
> > > > the object scalars are wrapped in the np_scalar type. A long
> > > > time
> > > > ago I
> > > > started to try to fix up various funny/strange behaviors of
> > > > object
> > > > datatypes, but there are lots of special cases, and the main
> > > > problem was
> > > > that the returned objects (eg from indexing) were not numpy
> > > > types
> > > > and
> > > > did not support numpy attributes or indexing. Wrapping the
> > > > returned
> > > > object in `np.numpy_scalar` might add an extra slight annoyance
> > > > to
> > > > people who want to unwrap the object, but I think it would make
> > > > object
> > > > arrays less buggy and make code using object arrays easier to
> > > > reason
> > > > about and debug.
> > > > 
> > > > Finally, a few random votes/comments based on the other emails
> > > > on
> > > > the list:
> > > > 
> > > > I think scalars have a place in numpy (rather than just reusing
> > > > 0d
> > > > arrays), since there is a clear use in having hashable,
> > > > immutable
> > > > scalars. Structured scalars should probably be immutable.
> > > > 
> > > > I agree with your suggestion that scalars should not be
> > > > indexable.
> > > > Thus,
> > > > my duck-scalars (and proposed numpy_scalar) would not be
> > > > indexable.
> > > > However, I think they should encode their datatype though a
> > > > .dtype
> > > > attribute like ndarrays, rather than by inheritance.
> > > > 
> > > > Also, something to think about is that currently numpy scalars
> > > > satisfy
> > > > the property `isinstance(np.float64(1), float)`, i.e they are
> > > > within the
> > > > python numerical type hierarchy. 0d arrays do not have this
> > > > property. My
> > > > proposal above would break this. I'm not sure what to think
> > > > about
> > > > whether this is a good property to maintain or not.
> > > > 
> > > > Cheers,
> > > > Allan
> > > > 
> > > > 
> > > > 
> > > > On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > > > > Hi all,
> > > > > 
> > > > > When we create new datatypes, we have the option to make new
> > > > > choices
> > > > > for the new datatypes [0] (not the existing ones).
> > > > > 
> > > > > The question is: Should every NumPy datatype have a scalar
> > > > > associated
> > > > > and should operations like indexing return a scalar or a 0-D
> > > > > array?
> > > > > 
> > > > > This is in my opinion a complex, almost philosophical,
> > > > > question,
> > > > > and we
> > > > > do not have to settle anything for a long time. But, if we do
> > > > > not
> > > > > decide a direction before we have many new datatypes the
> > > > > decision
> > > > > will
> > > > > make itself...
> > > > > So happy about any ideas, even if its just a gut feeling :).
> > > > > 
> > > > > There are various points. I would like to mostly ignore the
> > > > > technical
> > > > > ones, but I am listing them anyway here:
> > > > > 
> > > > >   * Scalars are faster (although that can be optimized
> > > > > likely)
> > > > > 
> > > > >   * Scalars have a lower memory footprint
> > > > > 
> > > > >   * The current implementation incurs a technical debt in
> > > > > NumPy.
> > > > >     (I do not think that is a general issue, though. We could
> > > > >     automatically create scalars for each new datatype
> > > > > probably.)
> > > > > 
> > > > > Advantages of having no scalars:
> > > > > 
> > > > >   * No need to keep track of scalars to preserve them in
> > > > > ufuncs,
> > > > > or
> > > > >     libraries using `np.asarray`, do they need
> > > > > `np.asarray_or_scalar`?
> > > > >     (or decide they return always arrays, although ufuncs may
> > > > > not)
> > > > > 
> > > > >   * Seems simpler in many ways, you always know the output
> > > > > will
> > > > > be an
> > > > >     array if it has to do with NumPy.
> > > > > 
> > > > > Advantages of having scalars:
> > > > > 
> > > > >   * Scalars are immutable and we are used to them from
> > > > > Python.
> > > > >     A 0-D array cannot be used as a dictionary key
> > > > > consistently
> > > > > [1].
> > > > > 
> > > > >     I.e. without scalars as first class citizen
> > > > > `dict[arr1d[0]]`
> > > > >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > > > > defined,
> > > > >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to
> > > > > work.
> > > > > [2]
> > > > > 
> > > > >   * Object arrays as we have them now make sense, `arr1d[0]`
> > > > > can
> > > > >     reasonably return a Python object. I.e. arrays feel more
> > > > > like
> > > > >     container if you can take elements out easily.
> > > > > 
> > > > > Could go both ways:
> > > > > 
> > > > >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the
> > > > > array
> > > > >     without scalars. With scalars `arr1d[0, ...]` clarifies
> > > > > the
> > > > >     meaning. (In principle it is good to never use `arr2d[0]`
> > > > > to
> > > > >     get a 1D slice, probably more-so if scalars exist.)
> > > > > 
> > > > > Note: array-scalars (the current NumPy scalars) are not
> > > > > useful in
> > > > > my
> > > > > opinion [3]. A scalar should not be indexed or have a shape.
> > > > > I do
> > > > > not
> > > > > believe in scalars pretending to be arrays.
> > > > > 
> > > > > I personally tend towards liking scalars.  If Python was a
> > > > > language
> > > > > where the array (array-programming) concept was ingrained
> > > > > into
> > > > > the
> > > > > language itself, I would lean the other way. But users are
> > > > > used
> > > > > to
> > > > > scalars, and they "put" scalars into arrays. Array objects
> > > > > are in
> > > > > some
> > > > > ways strange in Python, and I feel not having scalars
> > > > > detaches
> > > > > them
> > > > > further.
> > > > > 
> > > > > Having scalars, however also means we should preserve them. I
> > > > > feel in
> > > > > principle that is actually fairly straight forward. E.g. for
> > > > > ufuncs:
> > > > > 
> > > > >    * np.add(scalar, scalar) -> scalar
> > > > >    * np.add.reduce(arr, axis=None) -> scalar
> > > > >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> > > > >    * np.add.reduce(scalar, axis=()) -> array
> > > > > 
> > > > > Of course libraries that do `np.asarray` would/could
> > > > > basically
> > > > > chose to
> > > > > not preserve scalars: Their signature is defined as taking
> > > > > strictly
> > > > > array input.
> > > > > 
> > > > > Cheers,
> > > > > 
> > > > > Sebastian
> > > > > 
> > > > > 
> > > > > [0] At best this can be a vision to decide which way they may
> > > > > evolve.
> > > > > 
> > > > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > > > > arguably
> > > > > strange. E.g. Quantity defines hash correctly, but does not
> > > > > fully
> > > > > ensure immutability for 0-D Quantities. Ensuring immutability
> > > > > in
> > > > > a
> > > > > world where "views" are a central concept requires a write-
> > > > > only
> > > > > copy.
> > > > > 
> > > > > [2] Arguably `.item()` would always return a scalar, but it
> > > > > would
> > > > > be a
> > > > > second class citizen. (Although if it returns a scalar, at
> > > > > least
> > > > > we
> > > > > already have a scalar implementation.)
> > > > > 
> > > > > [3] They are necessary due to technical debt for NumPy
> > > > > datatypes
> > > > > though.
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > 
> > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200408/a4f10d44/attachment.sig>


More information about the NumPy-Discussion mailing list