[Numpy-discussion] a summary function to get a quick glimpse on the contents of a numpy array

Ralf Gommers ralf.gommers at gmail.com
Sat Aug 1 14:52:11 EDT 2020


On Fri, Jul 31, 2020 at 1:40 PM Peter Steinbach <p.steinbach at hzdr.de> wrote:

> Dear numpy devs and interested readers,
>
> as a day-to-day user, it occurred to me that having a quick look into the
> contents and extents of arrays is well doable with
> numpy. numpy offers a rich set of methods for this. However, very often I
> oversee myself and others that one just wants to see
> if the values of an array have a certain min/max or mean or how wide the
> range of values are.
>
> I hence sat down to write a summary function that returns a string of
> hand-packed summary statistics for a quick inspection. I
> propose to include it into numpy and would love to have your feedback on
> this idea before I submit a PR. Here is the core
> functionality:
>
>     Examples
>     --------
>     >>> a = np.random.normal(size=20)
>     >>> print(summary(a))
>                 min     25perc       mean      stdev     median
>  75perc        max
>           -2.289870  -2.265757  -0.083213   1.115033  -0.162885
> -2.217532   1.639802
>     >>> a = np.reshape(a, newshape=(4,5))
>     >>> print(summary(a,axis=1))
>                 min     25perc       mean      stdev     median
>  75perc        max
>        0  -0.976279  -0.974090   0.293003   1.009383   0.466814
> -0.969712   1.519695
>        1  -0.468854  -0.467739   0.184139   0.649378  -0.036762
> -0.465510   1.303144
>        2  -2.289870  -2.276455  -0.324450   1.230031  -0.289008
> -2.249625   1.111107
>        3  -1.782239  -1.777304  -0.485546   1.259598  -1.236190
> -1.767434   1.639802
>
> So you see, it is merely a tiny helper function that can aid practitioners
> and data scientists to get a quick insight on what an
> array contains.
>
> first off, here is the code:
>
> https://github.com/psteinb/numpy/blob/summary-function/numpy/lib/utils.py#L1021
>
> I put it there as I am not sure at this point, if the community would
> appreciate such a function or not. Judging from the tests,
> lib/utils.py appears to a be place for undocumented functions. So to
> resolve this and prepare a proper PR, please let me know
> where this summary function could reside!
>

This seems to be more the domain of scipy.stats and statsmodels.
Statsmodels already does a good job with this; in SciPy there's
stats.describe (
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html)
which is quite similar to what you're proposing. Could you think about
whether scipy.stats.describe does what you want, and if there's room to
improve it (perhaps add a `__repr__` and/or a `__html_repr__` for
pretty-printing)?

Cheers,
Ralf


> Second, please give me your thoughts on the summary function's output?
> Should the number of digits be configurable? Should the
> columns be configurable? Is is ok to honor the axis parameter which is
> found in so many numpy functions?
>
> Last but not least, let me stress that this is my first time contribution
> to numpy. I love the library and would like to
> contribute something back. So bear with me, if my code violates best
> practices in your community for now. I'll bite my teeth
> into the formalities of a github PR once I get support from the community
> and the core devs.
>
> I think that a summary function would be a valuable addition to numpy!
> Best,
> Peter
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200801/9db7aff3/attachment.html>


More information about the NumPy-Discussion mailing list