New DTypes: Are scalars a central concept in NumPy or not?
Hi all, When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones). The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array? This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :). There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here: * Scalars are faster (although that can be optimized likely) * Scalars have a lower memory footprint * The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.) Advantages of having no scalars: * No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not) * Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy. Advantages of having scalars: * Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1]. I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2] * Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily. Could go both ways: * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.) Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays. I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further. Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs: * np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input. Cheers, Sebastian [0] At best this can be a vision to decide which way they may evolve. [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy. [2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.) [3] They are necessary due to technical debt for NumPy datatypes though.
I personally have always found it weird and annoying to deal with 0D arrays, so +1 for scalars!* Juan *: admittedly, I have almost no grasp of the underlying NumPy implementation complexities, but I will happily take Sebastian's word that scalars can be consistent with the library. On Fri, 21 Feb 2020, at 7:37 PM, Sebastian Berg wrote:
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though.
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
*Attachments:* * signature.asc
Hi Sebastian, Just to clarify the difference:
x = np.float64(42) y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it
is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
not having a hashable tuple conversion would be a strong limitation
a = tuple(np.arange(5))
versus
a = tuple([np.array(i) for i in range(5)])
{a:5}
Josef
On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski
Hi Sebastian,
Just to clarify the difference:
x = np.float64(42) y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct? If that's the case, not having the former would be very confusing for users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
wrote: Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Sat, Feb 22, 2020 at 9:34 AM
not having a hashable tuple conversion would be a strong limitation
a = tuple(np.arange(5)) versus a = tuple([np.array(i) for i in range(5)]) {a:5}
also there is the question of which scalar .item() versus [()] This was used in the old times in scipy.stats, and I just saw https://github.com/scipy/scipy/pull/11165#issuecomment589952838 aside: AFAIR, I use 0dim arrays also to ensure that I have a numpy dtype and not, e.g. some equivalent python type Josef
Josef
On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski < evgeny.burovskiy@gmail.com> wrote:
Hi Sebastian,
Just to clarify the difference:
x = np.float64(42) y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct? If that's the case, not having the former would be very confusing for users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
wrote: Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Sat, Feb 22, 2020 at 9:41 AM
On Sat, Feb 22, 2020 at 9:34 AM
wrote: not having a hashable tuple conversion would be a strong limitation
a = tuple(np.arange(5)) versus a = tuple([np.array(i) for i in range(5)]) {a:5}
also there is the question of which scalar
.item() versus [()]
This was used in the old times in scipy.stats, and I just saw https://github.com/scipy/scipy/pull/11165#issuecomment589952838
aside: AFAIR, I use 0dim arrays also to ensure that I have a numpy dtype and not, e.g. some equivalent python type
0dim as mutable pseudoscalar a = np.asarray(5) a, id(a) (array(5), 844574884528) a[()] = 1 a, id(a) (array(1), 844574884528) maybe I never used that, In a recent similar case, I could use just a 1d list or array to work around python's muting or mutability behavior
Josef
Josef
On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski < evgeny.burovskiy@gmail.com> wrote:
Hi Sebastian,
Just to clarify the difference:
x = np.float64(42) y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct? If that's the case, not having the former would be very confusing for users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
wrote: Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
Off the cuff, my intuition is that dtypes will want to be able to
define how scalar indexing works, and let it return objects other than
arrays. So e.g.:
 some dtypes might just return a zerod array
 some dtypes might want to return some arbitrary domainappropriate
type, like a datetime dtype might want to return datetime.datetime
objects (like how dtype(object) works now)
 some dtypes might want to go to all the trouble to define immutable
duckarray "scalar" types (like how dtype(float) and friends work now)
But I don't think we need to give that last case any special
privileges in the dtype system. For example, I don't think we need to
mandate that everyone who defines their own dtype MUST also implement
a custom duckarray type to act as the scalars, or build a whole
complex system to autogenerate such types given an arbitrary
userdefined dtype.
n
On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
 Nathaniel J. Smith  https://vorpus.org
On Sat, 20200222 at 13:28 0800, Nathaniel Smith wrote:
Off the cuff, my intuition is that dtypes will want to be able to define how scalar indexing works, and let it return objects other than arrays. So e.g.:
 some dtypes might just return a zerod array  some dtypes might want to return some arbitrary domainappropriate type, like a datetime dtype might want to return datetime.datetime objects (like how dtype(object) works now)  some dtypes might want to go to all the trouble to define immutable duckarray "scalar" types (like how dtype(float) and friends work now)
Right, my assumption is that whatever we suggest is going to be what most will choose, so we have the chance to move in a certain direction and set a standard. This is to make code which may or may not deal with 0D arrays more reliable (more below).
But I don't think we need to give that last case any special privileges in the dtype system. For example, I don't think we need to mandate that everyone who defines their own dtype MUST also implement a custom duckarray type to act as the scalars, or build a whole complex system to autogenerate such types given an arbitrary userdefined dtype.
(Note that "autogenerating" would be nothing more than a writeonly 0D array, which does not implement indexing.) There are also categoricals, for which the type may just be "object" in practice (you could define it closer, but it seems unlikely to be useful). And for simple numerical types, if we go the `.item()` path, it is arguably fine if the type is just a python type. Maybe the crux of the problem is actuall that in general `np.asarray(arr1d[0])` does not roundtrip for the current object dtype, and only partially for a categorical above. As such that is fine, but right now it is hard to tell when you will have a scalar and when a 0D array. Maybe it is better to talk about a potentially new `np.pyobject[type]` datatype (i.e. an object datatype with all elements having the same python type). Currently writing generic code with the object dtype is tricky, because we randomly return the object instead of arrays. What would be the preference for such a specific dtype? * arr1d[0] > scalar or array? * np.add(scalar, scalar) > scalar or array * np.add.reduce(arr) > scalar or array? I think the `np.add` case we can decide fairly independently. The main thing is the indexing. Would we want to force a `.item()` call or not? Forcing `.item()` is in many ways simpler, I am unsure whether it would be inconvenient often. And, maybe the answer is just that for datatypes that do not roundtrip easily, `.item()` is probably preferable, and for datatypes that do roundtrip scalars are fine.  Sebastian
On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
wrote: Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
Hi, Sebastian,
On 22.02.20, 02:37, "NumPyDiscussion on behalf of Sebastian Berg"
I have some thoughts on scalars from playing with ndarray ducktypes (__array_function__), eg a MaskedArray ndarrayducktype, for which I wanted an associated "MaskedScalar" type. In summary, the ways scalars currently work makes ducktyping (duckscalars) difficult: * numpy scalar types are not subclassable, so my duckscalars aren't subclasses of numpy scalars and aren't in the type hierarchy * even if scalars were subclassable, I would have to subclass each scalar datatype individually to make masked versions * lots of code checks `np.isinstance(var, np.float64)` which breaks for my duckscalars * it was difficult to distinguish between a duckscalar and a duck0d array. The method I used in the end seems hacky. This has led to some daydreams about how scalars should work, and also led me last to read through your NEPs 40/41 with specific focus on what you said about scalars, and was about to post there until I saw this discussion. I agree with what you said in the NEPs about not making scalars be dtype instances. Here is what ducktypes led me to: If we are able to do something like define a `np.numpy_scalar` type covering all numpy scalars, which has a `.dtype` attribute like you describe in the NEPs, then that would seem to solve the ducktype problems above. Ducktype implementors would need to make a "duckscalar" type in parallel to their "duckndarray" type, but I found that to be pretty easy using an abstract class in my MaskedArray ducktype, since the MaskedArray and MaskedScalar share a lot of behavior. A numpy_scalar type would also help solve some objectarray problems if the object scalars are wrapped in the np_scalar type. A long time ago I started to try to fix up various funny/strange behaviors of object datatypes, but there are lots of special cases, and the main problem was that the returned objects (eg from indexing) were not numpy types and did not support numpy attributes or indexing. Wrapping the returned object in `np.numpy_scalar` might add an extra slight annoyance to people who want to unwrap the object, but I think it would make object arrays less buggy and make code using object arrays easier to reason about and debug. Finally, a few random votes/comments based on the other emails on the list: I think scalars have a place in numpy (rather than just reusing 0d arrays), since there is a clear use in having hashable, immutable scalars. Structured scalars should probably be immutable. I agree with your suggestion that scalars should not be indexable. Thus, my duckscalars (and proposed numpy_scalar) would not be indexable. However, I think they should encode their datatype though a .dtype attribute like ndarrays, rather than by inheritance. Also, something to think about is that currently numpy scalars satisfy the property `isinstance(np.float64(1), float)`, i.e they are within the python numerical type hierarchy. 0d arrays do not have this property. My proposal above would break this. I'm not sure what to think about whether this is a good property to maintain or not. Cheers, Allan On 2/21/20 8:37 PM, Sebastian Berg wrote:
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though.
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
I've always found the duality of zerod arrays an scalars confusing, and
I'm sure I'm not alone.
Having both is just plain weird.
But, backward compatibility aside, could we have ONLY Scalars?
When we index into an array, the dimensionality is reduced by one, so
indexing into a 1D array has to get us something: but the zerod array is a
really weird object  do we really need it?
There is certainly a need for more numpylike scalars: more than the built
in data types, and some handy attributes and methods, like dtype,
.itemsize, etc. But could we make an enhanced scalar that had everything we
actually need from a zerod array?
The key point would be mutability  but do we really need mutable scalars?
I can't think of any time I've needed that, when I couldn't have used a 1d
array of length 1.
Is there a use case for zerod arrays that could not be met with an
enhanced scalar?
CHB
On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane
I have some thoughts on scalars from playing with ndarray ducktypes (__array_function__), eg a MaskedArray ndarrayducktype, for which I wanted an associated "MaskedScalar" type.
In summary, the ways scalars currently work makes ducktyping (duckscalars) difficult:
* numpy scalar types are not subclassable, so my duckscalars aren't subclasses of numpy scalars and aren't in the type hierarchy * even if scalars were subclassable, I would have to subclass each scalar datatype individually to make masked versions * lots of code checks `np.isinstance(var, np.float64)` which breaks for my duckscalars * it was difficult to distinguish between a duckscalar and a duck0d array. The method I used in the end seems hacky.
This has led to some daydreams about how scalars should work, and also led me last to read through your NEPs 40/41 with specific focus on what you said about scalars, and was about to post there until I saw this discussion. I agree with what you said in the NEPs about not making scalars be dtype instances.
Here is what ducktypes led me to:
If we are able to do something like define a `np.numpy_scalar` type covering all numpy scalars, which has a `.dtype` attribute like you describe in the NEPs, then that would seem to solve the ducktype problems above. Ducktype implementors would need to make a "duckscalar" type in parallel to their "duckndarray" type, but I found that to be pretty easy using an abstract class in my MaskedArray ducktype, since the MaskedArray and MaskedScalar share a lot of behavior.
A numpy_scalar type would also help solve some objectarray problems if the object scalars are wrapped in the np_scalar type. A long time ago I started to try to fix up various funny/strange behaviors of object datatypes, but there are lots of special cases, and the main problem was that the returned objects (eg from indexing) were not numpy types and did not support numpy attributes or indexing. Wrapping the returned object in `np.numpy_scalar` might add an extra slight annoyance to people who want to unwrap the object, but I think it would make object arrays less buggy and make code using object arrays easier to reason about and debug.
Finally, a few random votes/comments based on the other emails on the list:
I think scalars have a place in numpy (rather than just reusing 0d arrays), since there is a clear use in having hashable, immutable scalars. Structured scalars should probably be immutable.
I agree with your suggestion that scalars should not be indexable. Thus, my duckscalars (and proposed numpy_scalar) would not be indexable. However, I think they should encode their datatype though a .dtype attribute like ndarrays, rather than by inheritance.
Also, something to think about is that currently numpy scalars satisfy the property `isinstance(np.float64(1), float)`, i.e they are within the python numerical type hierarchy. 0d arrays do not have this property. My proposal above would break this. I'm not sure what to think about whether this is a good property to maintain or not.
Cheers, Allan
On 2/21/20 8:37 PM, Sebastian Berg wrote:
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though.
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
 Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
On Mon, 20200323 at 11:45 0700, Chris Barker wrote:
I've always found the duality of zerod arrays an scalars confusing, and I'm sure I'm not alone.
Having both is just plain weird.
I guess so, it is a tricky situation, and I do not really have an answer.
But, backward compatibility aside, could we have ONLY Scalars?
When we index into an array, the dimensionality is reduced by one, so indexing into a 1D array has to get us something: but the zerod array is a really weird object  do we really need it?
Well, it is hard to write functions that work on NDimensions (where N can be 0), if the 0D array does not exist. You can get away with scalars in most cases, because they pretend to be arrays in most cases (aside from mutability). But I am pretty sure we have a bunch of cases that need `res = np.asarray(res)` simply because `res` is ND but could then be silently converted to a scalar. E.g. see https://github.com/numpy/numpy/issues/13105 for an issue about this (although it does not actually list any specific problems).  Sebastian
There is certainly a need for more numpylike scalars: more than the built in data types, and some handy attributes and methods, like dtype, .itemsize, etc. But could we make an enhanced scalar that had everything we actually need from a zerod array?
The key point would be mutability  but do we really need mutable scalars? I can't think of any time I've needed that, when I couldn't have used a 1d array of length 1.
Is there a use case for zerod arrays that could not be met with an enhanced scalar?
CHB
On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < allanhaldane@gmail.com> wrote:
I have some thoughts on scalars from playing with ndarray ducktypes (__array_function__), eg a MaskedArray ndarrayducktype, for which I wanted an associated "MaskedScalar" type.
In summary, the ways scalars currently work makes ducktyping (duckscalars) difficult:
* numpy scalar types are not subclassable, so my duckscalars aren't subclasses of numpy scalars and aren't in the type hierarchy * even if scalars were subclassable, I would have to subclass each scalar datatype individually to make masked versions * lots of code checks `np.isinstance(var, np.float64)` which breaks for my duckscalars * it was difficult to distinguish between a duckscalar and a duck0d array. The method I used in the end seems hacky.
This has led to some daydreams about how scalars should work, and also led me last to read through your NEPs 40/41 with specific focus on what you said about scalars, and was about to post there until I saw this discussion. I agree with what you said in the NEPs about not making scalars be dtype instances.
Here is what ducktypes led me to:
If we are able to do something like define a `np.numpy_scalar` type covering all numpy scalars, which has a `.dtype` attribute like you describe in the NEPs, then that would seem to solve the ducktype problems above. Ducktype implementors would need to make a "duck scalar" type in parallel to their "duckndarray" type, but I found that to be pretty easy using an abstract class in my MaskedArray ducktype, since the MaskedArray and MaskedScalar share a lot of behavior.
A numpy_scalar type would also help solve some objectarray problems if the object scalars are wrapped in the np_scalar type. A long time ago I started to try to fix up various funny/strange behaviors of object datatypes, but there are lots of special cases, and the main problem was that the returned objects (eg from indexing) were not numpy types and did not support numpy attributes or indexing. Wrapping the returned object in `np.numpy_scalar` might add an extra slight annoyance to people who want to unwrap the object, but I think it would make object arrays less buggy and make code using object arrays easier to reason about and debug.
Finally, a few random votes/comments based on the other emails on the list:
I think scalars have a place in numpy (rather than just reusing 0d arrays), since there is a clear use in having hashable, immutable scalars. Structured scalars should probably be immutable.
I agree with your suggestion that scalars should not be indexable. Thus, my duckscalars (and proposed numpy_scalar) would not be indexable. However, I think they should encode their datatype though a .dtype attribute like ndarrays, rather than by inheritance.
Also, something to think about is that currently numpy scalars satisfy the property `isinstance(np.float64(1), float)`, i.e they are within the python numerical type hierarchy. 0d arrays do not have this property. My proposal above would break this. I'm not sure what to think about whether this is a good property to maintain or not.
Cheers, Allan
On 2/21/20 8:37 PM, Sebastian Berg wrote:
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though.
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
sorry to have fallen off the numpy grid for a bit, but:
On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg
On Mon, 20200323 at 11:45 0700, Chris Barker wrote:
But, backward compatibility aside, could we have ONLY Scalars?
Well, it is hard to write functions that work on NDimensions (where N can be 0), if the 0D array does not exist. You can get away with scalars in most cases, because they pretend to be arrays in most cases (aside from mutability).
But I am pretty sure we have a bunch of cases that need `res = np.asarray(res)` simply because `res` is ND but could then be silently converted to a scalar. E.g. see https://github.com/numpy/numpy/issues/13105 for an issue about this (although it does not actually list any specific problems).
I'm not sure this is insolvable (again, backwards compatibility aside)  after all, one of the key issues is that it's undetermined what the rank should be of: array(a_scalar)  0d is the only unambiguous answer, but then it's not really an array in the usual sense anyway. So in theory, we could not allow that conversion without specifying a rank. at the end of the day, there has to be some endpoint on how far you can reduce the rank of an array and have it work  why not have 1 be the lower limit? CHB
 Sebastian
There is certainly a need for more numpylike scalars: more than the built in data types, and some handy attributes and methods, like dtype, .itemsize, etc. But could we make an enhanced scalar that had everything we actually need from a zerod array?
The key point would be mutability  but do we really need mutable scalars? I can't think of any time I've needed that, when I couldn't have used a 1d array of length 1.
Is there a use case for zerod arrays that could not be met with an enhanced scalar?
CHB
On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < allanhaldane@gmail.com> wrote:
I have some thoughts on scalars from playing with ndarray ducktypes (__array_function__), eg a MaskedArray ndarrayducktype, for which I wanted an associated "MaskedScalar" type.
In summary, the ways scalars currently work makes ducktyping (duckscalars) difficult:
* numpy scalar types are not subclassable, so my duckscalars aren't subclasses of numpy scalars and aren't in the type hierarchy * even if scalars were subclassable, I would have to subclass each scalar datatype individually to make masked versions * lots of code checks `np.isinstance(var, np.float64)` which breaks for my duckscalars * it was difficult to distinguish between a duckscalar and a duck0d array. The method I used in the end seems hacky.
This has led to some daydreams about how scalars should work, and also led me last to read through your NEPs 40/41 with specific focus on what you said about scalars, and was about to post there until I saw this discussion. I agree with what you said in the NEPs about not making scalars be dtype instances.
Here is what ducktypes led me to:
If we are able to do something like define a `np.numpy_scalar` type covering all numpy scalars, which has a `.dtype` attribute like you describe in the NEPs, then that would seem to solve the ducktype problems above. Ducktype implementors would need to make a "duck scalar" type in parallel to their "duckndarray" type, but I found that to be pretty easy using an abstract class in my MaskedArray ducktype, since the MaskedArray and MaskedScalar share a lot of behavior.
A numpy_scalar type would also help solve some objectarray problems if the object scalars are wrapped in the np_scalar type. A long time ago I started to try to fix up various funny/strange behaviors of object datatypes, but there are lots of special cases, and the main problem was that the returned objects (eg from indexing) were not numpy types and did not support numpy attributes or indexing. Wrapping the returned object in `np.numpy_scalar` might add an extra slight annoyance to people who want to unwrap the object, but I think it would make object arrays less buggy and make code using object arrays easier to reason about and debug.
Finally, a few random votes/comments based on the other emails on the list:
I think scalars have a place in numpy (rather than just reusing 0d arrays), since there is a clear use in having hashable, immutable scalars. Structured scalars should probably be immutable.
I agree with your suggestion that scalars should not be indexable. Thus, my duckscalars (and proposed numpy_scalar) would not be indexable. However, I think they should encode their datatype though a .dtype attribute like ndarrays, rather than by inheritance.
Also, something to think about is that currently numpy scalars satisfy the property `isinstance(np.float64(1), float)`, i.e they are within the python numerical type hierarchy. 0d arrays do not have this property. My proposal above would break this. I'm not sure what to think about whether this is a good property to maintain or not.
Cheers, Allan
On 2/21/20 8:37 PM, Sebastian Berg wrote:
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though.
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
 Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
On Wed, 20200408 at 12:37 0700, Chris Barker wrote:
sorry to have fallen off the numpy grid for a bit, but:
On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Mon, 20200323 at 11:45 0700, Chris Barker wrote:
But, backward compatibility aside, could we have ONLY Scalars? Well, it is hard to write functions that work on NDimensions (where N can be 0), if the 0D array does not exist. You can get away with scalars in most cases, because they pretend to be arrays in most cases (aside from mutability).
But I am pretty sure we have a bunch of cases that need `res = np.asarray(res)` simply because `res` is ND but could then be silently converted to a scalar. E.g. see https://github.com/numpy/numpy/issues/13105 for an issue about this (although it does not actually list any specific problems).
I'm not sure this is insolvable (again, backwards compatibility aside)  after all, one of the key issues is that it's undetermined what the rank should be of: array(a_scalar)  0d is the only unambiguous answer, but then it's not really an array in the usual sense anyway. So in theory, we could not allow that conversion without specifying a rank.
So as a (silly) example, the following does not generalize to 0d, even though it should: def weird_normalize_by_trace_inplace(stacked_matrices) """Devides matrices by their trace but retains sign (works inplace, and thus e.g. not for integer arrays) Parameters  stacked_matrices : (..., N, M) ndarray """ assert stacked_matrices.shape[1] == stacked_matrices.shape[2] trace = np.trace(stacked_matrices, axis1=2, axis2=1) trace[trace < 0] *= 1 stacked_matrices /= trace Sure that function does not make sense and you could rewrite it, but the fact is that in that function you want to conditionally modify trace inplace, but trace can be 0d and the "conditional" modification breaks down.  Sebastian
at the end of the day, there has to be some endpoint on how far you can reduce the rank of an array and have it work  why not have 1 be the lower limit?
CHB
 Sebastian
There is certainly a need for more numpylike scalars: more than the built in data types, and some handy attributes and methods, like dtype, .itemsize, etc. But could we make an enhanced scalar that had everything we actually need from a zerod array?
The key point would be mutability  but do we really need mutable scalars? I can't think of any time I've needed that, when I couldn't have used a 1d array of length 1.
Is there a use case for zerod arrays that could not be met with an enhanced scalar?
CHB
On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < allanhaldane@gmail.com> wrote:
I have some thoughts on scalars from playing with ndarray ducktypes (__array_function__), eg a MaskedArray ndarrayducktype, for which I wanted an associated "MaskedScalar" type.
In summary, the ways scalars currently work makes ducktyping (duckscalars) difficult:
* numpy scalar types are not subclassable, so my duckscalars aren't subclasses of numpy scalars and aren't in the type hierarchy * even if scalars were subclassable, I would have to subclass each scalar datatype individually to make masked versions * lots of code checks `np.isinstance(var, np.float64)` which breaks for my duckscalars * it was difficult to distinguish between a duckscalar and a duck0d array. The method I used in the end seems hacky.
This has led to some daydreams about how scalars should work, and also led me last to read through your NEPs 40/41 with specific focus on what you said about scalars, and was about to post there until I saw this discussion. I agree with what you said in the NEPs about not making scalars be dtype instances.
Here is what ducktypes led me to:
If we are able to do something like define a `np.numpy_scalar` type covering all numpy scalars, which has a `.dtype` attribute like you describe in the NEPs, then that would seem to solve the ducktype problems above. Ducktype implementors would need to make a "duck scalar" type in parallel to their "duckndarray" type, but I found that to be pretty easy using an abstract class in my MaskedArray ducktype, since the MaskedArray and MaskedScalar share a lot of behavior.
A numpy_scalar type would also help solve some objectarray problems if the object scalars are wrapped in the np_scalar type. A long time ago I started to try to fix up various funny/strange behaviors of object datatypes, but there are lots of special cases, and the main problem was that the returned objects (eg from indexing) were not numpy types and did not support numpy attributes or indexing. Wrapping the returned object in `np.numpy_scalar` might add an extra slight annoyance to people who want to unwrap the object, but I think it would make object arrays less buggy and make code using object arrays easier to reason about and debug.
Finally, a few random votes/comments based on the other emails on the list:
I think scalars have a place in numpy (rather than just reusing 0d arrays), since there is a clear use in having hashable, immutable scalars. Structured scalars should probably be immutable.
I agree with your suggestion that scalars should not be indexable. Thus, my duckscalars (and proposed numpy_scalar) would not be indexable. However, I think they should encode their datatype though a .dtype attribute like ndarrays, rather than by inheritance.
Also, something to think about is that currently numpy scalars satisfy the property `isinstance(np.float64(1), float)`, i.e they are within the python numerical type hierarchy. 0d arrays do not have this property. My proposal above would break this. I'm not sure what to think about whether this is a good property to maintain or not.
Cheers, Allan
On 2/21/20 8:37 PM, Sebastian Berg wrote:
Hi all,
When we create new datatypes, we have the option to make new choices for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we do not have to settle anything for a long time. But, if we do not decide a direction before we have many new datatypes the decision will make itself... So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy. (I do not think that is a general issue, though. We could automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or libraries using `np.asarray`, do they need `np.asarray_or_scalar`? (or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python. A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]` cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can reasonably return a Python object. I.e. arrays feel more like container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array without scalars. With scalars `arr1d[0, ...]` clarifies the meaning. (In principle it is good to never use `arr2d[0]` to get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my opinion [3]. A scalar should not be indexed or have a shape. I do not believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language where the array (arrayprogramming) concept was ingrained into the language itself, I would lean the other way. But users are used to scalars, and they "put" scalars into arrays. Array objects are in some ways strange in Python, and I feel not having scalars detaches them further.
Having scalars, however also means we should preserve them. I feel in principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar * np.add.reduce(arr, axis=None) > scalar * np.add.reduce(arr, axis=1) > array (even if arr is 1d) * np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to not preserve scalars: Their signature is defined as taking strictly array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably strange. E.g. Quantity defines hash correctly, but does not fully ensure immutability for 0D Quantities. Ensuring immutability in a world where "views" are a central concept requires a write only copy.
[2] Arguably `.item()` would always return a scalar, but it would be a second class citizen. (Although if it returns a scalar, at least we already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes though.
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Wed, Apr 8, 2020 at 1:17 PM Sebastian Berg
But, backward compatibility aside, could we have ONLY Scalars? Well, it is hard to write functions that work on NDimensions (where N can be 0), if the 0D array does not exist.
So as a (silly) example, the following does not generalize to 0d, even though it should:
def weird_normalize_by_trace_inplace(stacked_matrices) """Devides matrices by their trace but retains sign (works inplace, and thus e.g. not for integer arrays)
Parameters  stacked_matrices : (..., N, M) ndarray """ assert stacked_matrices.shape[1] == stacked_matrices.shape[2]
trace = np.trace(stacked_matrices, axis1=2, axis2=1) trace[trace < 0] *= 1 stacked_matrices /= trace
Sure that function does not make sense and you could rewrite it, but the fact is that in that function you want to conditionally modify trace inplace, but trace can be 0d and the "conditional" modification breaks down.
I guess that's what I'm getting at  there is always an endpoint to reducing the rank. a function that's designed to work on a "stack" of something doesn't have to work on a single something, when it can, instead, work on a "stack" of hight one. Isn't the trace of a matrix always a scalar? and thus the trace(s) of a stack of matrixes would always be 1D? So that function should do something like: stacked_matrixes.shape = (1, M, M) yes? and then it would always work. Again, backwards compatibility, but there is a reason the np.atleast_*() functions exist  you often need to make sure your inputs have the dimensionality expected. CHB  Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
participants (8)

Allan Haldane

Chris Barker

Evgeni Burovski

Hameer Abbasi

josef.pktd＠gmail.com

Juan NunezIglesias

Nathaniel Smith

Sebastian Berg