On Sat, Feb 22, 2020 at 9:34 AM <josef.pktd@gmail.com> wrote:

not having a hashable tuple conversion would be a strong limitation

a = tuple(np.arange(5))
versus
a = tuple([np.array(i) for i in range(5)])
{a:5}

also there is the question of which scalar

.item() versus [()]

This was used in the old times in scipy.stats, and I just saw

https://github.com/scipy/scipy/pull/11165#issuecomment-589952838

aside:

AFAIR, I use 0-dim arrays also to ensure that I have a numpy dtype and not, e.g. some equivalent python type

Josef

Josef

On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
Hi Sebastian,

Just to clarify the difference:

>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)

Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).

If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
is probably way too late to even think about it anyway.

Cheers,

Evgeni

On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<sebastian@sipsolutions.net> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0-D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) -> scalar
> * np.add.reduce(arr, axis=None) -> scalar
> * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion