<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker <span dir="ltr"><<a href="mailto:chris.barker@noaa.gov" target="_blank">chris.barker@noaa.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">a few thoughts:<span class="gmail-"><div><br></div><div>On Wed, Oct 19, 2016 at 12:08 PM, Todd <span dir="ltr"><<a href="mailto:toddrjen@gmail.com" target="_blank">toddrjen@gmail.com</a>></span> wrote:<br></div></span><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I have been thinking about how to go about having a multidimensional array <span class="gmail-m_8195056914081871669gmail-m_-1719530516994550405gmail-im">constructor</span> in python. I know that Python doesn't have a built-in multidimensional array class and won't for the foreseeable future. </div></div></blockquote><div><br></div></span><div>no but it does have buffers and memoryviews and the extended buffer protocol supports "strided" data -- i.e. multi-dimensional arrays. So it would be nice to have SOME simple ndarray object in the standard library that would wrap such buffers -- it would be nice for working with image data, interacting with numpy arrays, etc.</div><div><br></div><div>The "trick" is that once you have the container, you want some functionality -- so you add indexing and slicing -- natch. Then maybe some simple math? then.... eventually, you are trying to put all of numpy into the stdlib, and we already know we don't want to do that.</div><div><br></div><div>Though I still think a simple container that only supports indexing and slicing would be lovely.</div><span class="gmail-"><div><br></div><div>That all being said:</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><span style="font-family:monospace,monospace">a = [| 0, 1, 2 || 3, 4, 5 |]</span></div></div></blockquote><div><br></div></span><div>I really don't see the advantage of that over:</div><div><br></div><div>a = [[0, 1, 2],[3, 4, 5]]</div><div><br></div><div>really I don't -- and I'm a heavy numpy user, so I write a lot of those!</div><div><br></div><div>If there is a problem with the current options (and I'm not convinced there is) it's that it in'st a literal for multidimensional array, but rather a literal for a bunch of nested lists -- the list themselves are created, and so are all the "boxed" values in the array -- only to be pulled out and unboxed to be put in the array.</div><div><br></div></div></div></div></blockquote><div><br></div><div>But as you said, that is not a multidimensional array. We aren't comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = [[0, 1, 2],[3, 4, 5]]", we are comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = np.array([[0, 1, 2],[3, 4, 5]])". That is a bigger difference.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>However, this is only for literals -- if your data are large, then they are not going to be in literals, but rather read form a file or something, so this is really not much of a limitation.</div></div></div></div></blockquote><div><br></div><div>Even if your original data is large, I often need smaller areas when processing, for example for broadcasting or as arguments to processing functions.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>However, if you really don't like it, then you can pass a string to aconfsturctor function instead:</div><div><br></div><div>a = arr_from_string(" <span style="font-family:monospace,monospace">| 0, 1, 2 || 3, 4, 5 | </span>")</div><div><br></div><div>yeah, you need to type the extra quotes, but that's not much.</div></div></div></div></blockquote><div><br></div><div>Then you need an even longer function call. Again, that defeats the purpose of having a literal, which is to make the syntax more concise.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>NOTE: I'm pretty sure numpy has something like this already, for folks that like the MATLAB style -- though I can't find it at the moment.</div></div></div></div></blockquote><div><br></div><div>It is:<br><span style="font-family:monospace,monospace"><br>r_[[0, 1, 2], [3, 4, 5]</span><br><br>But this uses indexing behind the scenes, meaning your data is created as an index then needs to be converted to a list later. This adds considerable overhead. I just tested it and it was somewhere around 20 times slower than "np.array()" in the test.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><span style="font-family:monospace,monospace">b = [| 0, 1, 2 |<br></span></div><div><span style="font-family:monospace,monospace"> | 3, 4, 5 |]</span></div></div></blockquote><div><br></div></span><div><span style="font-family:monospace,monospace">b = [[ 0, 1, 2 ],<br></span></div><div><span style="font-family:monospace,monospace"> [ 3, 4, 5 ]]</span> </div><span class="gmail-"><div><br></div><div><br></div></span></div></div></div></blockquote><div><br></div><div>No, this is the equivalent of:<br><br></div><div><span style="font-family:monospace,monospace">b = np.array([[ 0, 1, 2 ],<br></span><div><span style="font-family:monospace,monospace"> [ 3, 4, 5 ]])<br></span> <br></div><div>The whole point of this is to avoid the "np.array" call.<br></div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>You can also create a 2D row array by combining the two:<br><br><span style="font-family:monospace,monospace">a = [|| 0, 1, 2 ||]</span></div></div></blockquote><div><br></div></span><div><div><font face="monospace, monospace">a = [[ 0, 1, 2 ]] or is it: [[[ 0, 1, 2 ]]]</font></div><div><br></div><div>(I can't tell, so maybe your syntax is not so clear???</div></div></div></div></div></blockquote><div><br><br></div><div>I am not clear where the ambiguity lies? Count the number of "|" symbols.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>For higher dimensions, you can just put more lines together:<br><br></div><div><span style="font-family:monospace,monospace">a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||]</span><br></div><div><br></div><div><span style="font-family:monospace,monospace">b = [||| 0, 1, 2<br></span></div><div><span style="font-family:monospace,monospace"> || 3, 4, 5<br> ||| 6, 7, 8<br> || 9, 10, 11<br> |||]</span></div></div></blockquote><div><br></div></span><div>I have no idea what that means!</div></div></div></div></blockquote><div><br><br></div><div>||| is the delimiter for the third dimension, || is the delimiter for the second dimension. It is like how newline is used as a delimeter for the second dimension in CSV files. So it is equivalent to:<br></div><div><br></div><div class="gmail_quote"><span style="font-family:monospace,monospace">b = np.array([[[0, 1, 2],<br> [3, 4, 5]],<br> [[6, 7, 8],<br> [9, 10, 11]]]) </span><span class="gmail-"><div><br><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span class="gmail-">At least in my opinion, this sort of approach
really shines when making higher-dimensional arrays. These would all be
equivalent (the | at the beginning and end are just to make it easier
to align indentation, they aren't required):<br><br><span style="font-family:monospace,monospace">a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134<br> || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7<br> ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21<br> || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56<br> |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145<br> || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15<br> ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60<br> || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133<br> ||||]</span></span></div></blockquote><div><br></div></span><div>It does seem that you are saving some typing when you have high-dim arrays, but I really dont see the readability here.</div></div></div></div></blockquote></div><div> </div></span></div><br><div>If you are used to counting braces, perhaps. But imagine someone who is just starting out. How do you describe how to determine what dimension is being split? "It is one more than total number of sequential left braces and left parentheses" vs “it is the number of vertical lines". Add to that having to deal with both left and right braces rather than a single delimiter adds a lot of visual noise. There is a reason we use commas rather than, say ">,<" as a delimiter in lists, it is easier to deal with a single kind of symbol rather than three (or potentially five in the current case).<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"></span><div><br></div><div><br></div><div>but anyway, the way to more this kind of thing forward is to use it as a new format in an existing lib (like numpy, by passing it as a big string. IF folks like it and start using it, then there is room for a conversation.</div></div></div></div></blockquote><div><br></div><div>The big problem with that is that having to wrap it as a string and pass it to a function in the numpy namespace loses much of the advantage from having a literal to begin with.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>But I doubt (and I wouldn't support) that anyone would put a literal into python for an object that doesn't exist in python...</div><div><br></div></div></div></div></blockquote><div><br></div><div>Yes, I understand that. But some projects are already doing that on their own. I think having a way for them to do it without losing the list constructor (which is the approach currently being taken) would be a benefit.<br></div></div><br></div></div>