[Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

Hameer Abbasi einstein.edison at gmail.com
Sun Feb 23 05:04:10 EST 2020


Hi, Sebastian,

On 22.02.20, 02:37, "NumPy-Discussion on behalf of Sebastian Berg" <numpy-discussion-bounces+hameerabbasi=yahoo.com at python.org on behalf of sebastian at sipsolutions.net> wrote:

    Hi all,
    
    When we create new datatypes, we have the option to make new choices
    for the new datatypes [0] (not the existing ones).
    
    The question is: Should every NumPy datatype have a scalar associated
    and should operations like indexing return a scalar or a 0-D array?
    
    This is in my opinion a complex, almost philosophical, question, and we
    do not have to settle anything for a long time. But, if we do not
    decide a direction before we have many new datatypes the decision will
    make itself...
    So happy about any ideas, even if its just a gut feeling :).
    
    There are various points. I would like to mostly ignore the technical
    ones, but I am listing them anyway here:
    
      * Scalars are faster (although that can be optimized likely)
    
      * Scalars have a lower memory footprint
    
      * The current implementation incurs a technical debt in NumPy.
        (I do not think that is a general issue, though. We could
        automatically create scalars for each new datatype probably.)
    
    Advantages of having no scalars:
    
      * No need to keep track of scalars to preserve them in ufuncs, or
        libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
        (or decide they return always arrays, although ufuncs may not)
    
      * Seems simpler in many ways, you always know the output will be an
        array if it has to do with NumPy.
    
    Advantages of having scalars:
    
      * Scalars are immutable and we are used to them from Python.
        A 0-D array cannot be used as a dictionary key consistently [1].
    
        I.e. without scalars as first class citizen `dict[arr1d[0]]`
        cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
        and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
    
      * Object arrays as we have them now make sense, `arr1d[0]` can
        reasonably return a Python object. I.e. arrays feel more like
        container if you can take elements out easily.
    
    Could go both ways:
    
      * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
        without scalars. With scalars `arr1d[0, ...]` clarifies the
        meaning. (In principle it is good to never use `arr2d[0]` to
        get a 1D slice, probably more-so if scalars exist.)

From a usability perspective, one could argue that if the dimension of the array one is indexing into is known and the user isn't advanced, then the behavior expected is one of scalars and not 0D arrays. If, however, the input dimension is unknown, then the behavior switch at 0D and the need for an extra ellipsis to ensure array-ness makes things confusing to regular users. I am file with the current behavior of indexing, as anything else would likely be a large backwards-compat break.

    
    Note: array-scalars (the current NumPy scalars) are not useful in my
    opinion [3]. A scalar should not be indexed or have a shape. I do not
    believe in scalars pretending to be arrays.
    
    I personally tend towards liking scalars.  If Python was a language
    where the array (array-programming) concept was ingrained into the
    language itself, I would lean the other way. But users are used to
    scalars, and they "put" scalars into arrays. Array objects are in some
    ways strange in Python, and I feel not having scalars detaches them
    further.
    
    Having scalars, however also means we should preserve them. I feel in
    principle that is actually fairly straight forward. E.g. for ufuncs:
    
       * np.add(scalar, scalar) -> scalar
       * np.add.reduce(arr, axis=None) -> scalar
       * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
       * np.add.reduce(scalar, axis=()) -> array

I love this idea.
    
    Of course libraries that do `np.asarray` would/could basically chose to
    not preserve scalars: Their signature is defined as taking strictly
    array input.
    
    Cheers,
    
    Sebastian
    
    
    [0] At best this can be a vision to decide which way they may evolve.
    
    [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
    strange. E.g. Quantity defines hash correctly, but does not fully 
    ensure immutability for 0-D Quantities. Ensuring immutability in a
    world where "views" are a central concept requires a write-only copy.
    
    [2] Arguably `.item()` would always return a scalar, but it would be a
    second class citizen. (Although if it returns a scalar, at least we
    already have a scalar implementation.)
    
    [3] They are necessary due to technical debt for NumPy datatypes
    though.
    _______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion at python.org
    https://mail.python.org/mailman/listinfo/numpy-discussion
    


More information about the NumPy-Discussion mailing list