Suggestion to show the shape in repr for summarized arrays
Hi All, When the repr of an array is shown, currently the dtype and shape are explicitly listed if these cannot be directly inferred from the list that is shown, i.e., if the dtype is not float64 or int64, and if the size of the array is zero, but the shape not the simple (0,). For instance, ``` np.empty((10,2,0), dtype="i2") array([], shape=(10, 2, 0), dtype=int16) ``` I propose to also show the shape for the (rare) case that an array is summarized, i.e., when it has more than the default threshold of 1000 elements, and elements are replaced by `...`. The logic is that also in that case it is no longer clear what the shape actually is, which is useful information (e.g., if working in a notebook -- which is the original use case at https://github.com/numpy/numpy/issues/27461). I have a PR for that at https://github.com/numpy/numpy/pull/27482 which would lead to the following: ``` np.arange(1001) array([ 0, 1, 2, ..., 998, 999, 1000], shape=(1001,)) ``` Just to be sure: this PR causes *no* change for any arrays with sizes less than a 1000, so I do not believe this change will lead to a lot of unnecessary churn for down-stream packages. Indeed, between numpy and astropy (which has lots of doctests), the only changes to (doc)tests that were needed are the very few for arrays where the "threshold" is explicitly exceeded. One irritant is that the shape is not an argument that can be passed in to an `np.array` call. While this is just as much the case for zero-sized arrays, perhaps a better solution would be to move the shape information out of the parentheses, e.g., using ``...) # shape=(...)``. I can change the PR to do that if that's the consensus. All the best, Marten
I like this. while ideally, eval(repr(an_object)) == object, in practice this is already violated fro large arays -- so other than doctests, this shouldn't cause too many headaches. -CHB On Mon, Sep 30, 2024 at 10:13 AM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
Hi All,
When the repr of an array is shown, currently the dtype and shape are explicitly listed if these cannot be directly inferred from the list that is shown, i.e., if the dtype is not float64 or int64, and if the size of the array is zero, but the shape not the simple (0,).
For instance,
``` np.empty((10,2,0), dtype="i2") array([], shape=(10, 2, 0), dtype=int16) ```
I propose to also show the shape for the (rare) case that an array is summarized, i.e., when it has more than the default threshold of 1000 elements, and elements are replaced by `...`. The logic is that also in that case it is no longer clear what the shape actually is, which is useful information (e.g., if working in a notebook -- which is the original use case at https://github.com/numpy/numpy/issues/27461).
I have a PR for that at https://github.com/numpy/numpy/pull/27482 which would lead to the following:
``` np.arange(1001) array([ 0, 1, 2, ..., 998, 999, 1000], shape=(1001,)) ```
Just to be sure: this PR causes *no* change for any arrays with sizes less than a 1000, so I do not believe this change will lead to a lot of unnecessary churn for down-stream packages. Indeed, between numpy and astropy (which has lots of doctests), the only changes to (doc)tests that were needed are the very few for arrays where the "threshold" is explicitly exceeded.
One irritant is that the shape is not an argument that can be passed in to an `np.array` call. While this is just as much the case for zero-sized arrays, perhaps a better solution would be to move the shape information out of the parentheses, e.g., using ``...) # shape=(...)``. I can change the PR to do that if that's the consensus.
All the best,
Marten _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: chris.barker@noaa.gov
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
I like this, too. And I think the trailing comment, # shape=... is much better, as it gets the best of both worlds: user get the info, and tools which do eval(repr(..)) only need to learn to ignore the comment. For one, numpy's own doctests will keep working with no churn since scipy_doctest handles this already. Evgeni вт, 1 окт. 2024 г., 01:07 Chris Barker via NumPy-Discussion < numpy-discussion@python.org>:
I like this.
while ideally, eval(repr(an_object)) == object, in practice this is already violated fro large arays -- so other than doctests, this shouldn't cause too many headaches.
-CHB
On Mon, Sep 30, 2024 at 10:13 AM Marten van Kerkwijk < mhvk@astro.utoronto.ca> wrote:
Hi All,
When the repr of an array is shown, currently the dtype and shape are explicitly listed if these cannot be directly inferred from the list that is shown, i.e., if the dtype is not float64 or int64, and if the size of the array is zero, but the shape not the simple (0,).
For instance,
``` np.empty((10,2,0), dtype="i2") array([], shape=(10, 2, 0), dtype=int16) ```
I propose to also show the shape for the (rare) case that an array is summarized, i.e., when it has more than the default threshold of 1000 elements, and elements are replaced by `...`. The logic is that also in that case it is no longer clear what the shape actually is, which is useful information (e.g., if working in a notebook -- which is the original use case at https://github.com/numpy/numpy/issues/27461).
I have a PR for that at https://github.com/numpy/numpy/pull/27482 which would lead to the following:
``` np.arange(1001) array([ 0, 1, 2, ..., 998, 999, 1000], shape=(1001,)) ```
Just to be sure: this PR causes *no* change for any arrays with sizes less than a 1000, so I do not believe this change will lead to a lot of unnecessary churn for down-stream packages. Indeed, between numpy and astropy (which has lots of doctests), the only changes to (doc)tests that were needed are the very few for arrays where the "threshold" is explicitly exceeded.
One irritant is that the shape is not an argument that can be passed in to an `np.array` call. While this is just as much the case for zero-sized arrays, perhaps a better solution would be to move the shape information out of the parentheses, e.g., using ``...) # shape=(...)``. I can change the PR to do that if that's the consensus.
All the best,
Marten _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: chris.barker@noaa.gov
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE <https://www.google.com/maps/search/7600+Sand+Point+Way+NE?entry=gmail&source=g> (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: evgeny.burovskiy@gmail.com
On Tue, Oct 1, 2024 at 9:57 AM Evgeni Burovski via NumPy-Discussion < numpy-discussion@python.org> wrote:
I like this, too.
And I think the trailing comment, # shape=... is much better, as it gets the best of both worlds: user get the info, and tools which do eval(repr(..)) only need to learn to ignore the comment. For one, numpy's own doctests will keep working with no churn since scipy_doctest handles this already.
Given that we've already chosen to use the fake `shape=...` keyword when there is a 0-length axis, what do you think we should do, consistency-wise?
np.empty([10, 0]) array([], shape=(10, 0), dtype=float64)
1. Follow the precedent and use the fake `shape=...` keyword in the summarized-array case. 2. Ignore the precedent and use a following `# shape=...` comment afterwards in the summarized-array case and leave the 0-length-axis case alone. 3. Fix the 0-length-axis case to use the following `# shape=...` comment too. -- Robert Kern
Given that we've already chosen to use the fake `shape=...` keyword when there is a 0-length axis, what do you think we should do, consistency-wise?
np.empty([10, 0]) array([], shape=(10, 0), dtype=float64)
1. Follow the precedent and use the fake `shape=...` keyword in the summarized-array case. 2. Ignore the precedent and use a following `# shape=...` comment afterwards in the summarized-array case and leave the 0-length-axis case alone. 3. Fix the 0-length-axis case to use the following `# shape=...` comment too.
Consistency-wise, I guess option 3 (fix 0-length-axis case) is the best one; whether it's worth code churn in NumPy... this can go either way, so option 2 (ignore the precedent and keep 0-length-axis arrays alone) is fine, too, IMO. Evgeni
We discussed this again and will merge the current version in a few days unless there is more discussion.
participants (5)
-
Chris Barker
-
Evgeni Burovski
-
Marten van Kerkwijk
-
matti.picus@gmail.com
-
Robert Kern