<div dir="ltr"><div dir="ltr">On Wed, Dec 9, 2020 at 2:24 PM Fang Zhang <<a href="mailto:fangzh@umich.edu">fangzh@umich.edu</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">By default, the __repr__ and __str__ functions of NumPy arrays summarize long arrays (i.e. omit all items but a few at beginning and end of each dimension), which is a good thing because when debugging, programmers can call print() on arrays with millions of elements without clogging the output or taking up too much CPU/memory (unsurprisingly, the string representation of an array item usually takes more bytes than its binary representation).<div dir="auto"><br></div><div dir="auto">However, this mechanic does not help when an array has a lot of short dimensions, e.g. np.arange(2 ** 20).reshape((2,) * 20). I often encounter such arrays in my work, and every once in a while I would try to print such an array without flattening it first (usually because I didn't know what shape or even what type the variable I was trying to print is), which has caused incidents ranging from losing everything in my scrollback buffer to crashing my computer by using too much memory.</div><div dir="auto"><br></div><div dir="auto">I think it may be a good idea to change the way NumPy pretty prints arrays with such shapes to avoid this situation. Something like "array([ 0, 1, 2, ..., 1048573, 1048574, 1048575]).reshape(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)" would be good enough for me. The condition to trigger such a representation can either be a fixed number of dimensions, or when after summarizing the pretty printer would still print more items than the threshold (1000 by default). Since the outputs of __repr__ and __str__ are meant for human eyes rather than computers, I think this should not cause too much of a compatibility problem.</div></div></blockquote><div><br></div><div>+1, this could use improvement. For high dimensional arrays, the way NumPy prints is way too verbose.<br></div><div> </div><div>In xarray, we automatically decrease "edgeitems" for printing NumPy arrays, to 2 for ndim=3 and 1 for ndim>3:</div><div><a href="https://github.com/pydata/xarray/blob/9802411b35291a6149d850e8e573cde71a93bfbf/xarray/core/formatting.py#L439-L453">https://github.com/pydata/xarray/blob/9802411b35291a6149d850e8e573cde71a93bfbf/xarray/core/formatting.py#L439-L453</a><br></div><div><br></div><div>As a last resort, we could consider automatically limiting the maximum number of displayed lines, adding "..." for clipped lines. It is unlikely, for example, that anymore ever wants to print more than ~100 lines of text to the screen, which can easily happen for very high dimensional arrays.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="auto"><br></div><div dir="auto">What do you all think?</div><div dir="auto"><br></div><div dir="auto">Sincerely,</div><div dir="auto">Fang Zhang</div></div>
_______________________________________________<br>
NumPy-Discussion mailing list<br>
<a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>
</blockquote></div></div>