<div dir="ltr"><div>I have been thinking about how to go about having a multidimensional array <span class="gmail-im">constructor</span> in python.  I know that Python doesn't have a built-in multidimensional array class and won't for the foreseeable future.  However, some projects have come up with their own ways of making it simpler to create such arrays compared to the current somewhat verbose approach, and it might even be possible (although I think highly unlikely) for Python to provide a hook for third-party libraries to tie into the sort of syntax here.  So I felt it might be worthwhile to get my thoughts on the topic in a central location for future use.  <br><br>If this sort of thing doesn't interest you I won't be offended if you stop reading now, and I apologize if it is considered off-topic for this ML.<br><br>The problem is finding an operator that isn't already being used, wouldn't conflict with existing rules, wouldn't break existing code, but that would still be at clearer and and more concise than the current syntax.<br><br>The notation I came up with uses "[|" and "|]".  I picked this for 4 reasons.  First, it isn't currently valid python syntax.  Second, it is clearly connected with the list <span class="gmail-im">constructor</span> "[ ]".  Third, it is reminiscent of the "⟦ ⟧" symbols used for matrices in mathematics.  Fourth, "{| |}" and "(| |)" could be used for similar data structures (such as "{| |}" for labeled arrays like in pandas).<br><br>Here is an example of how it would be used for a 1D array:<br><br><span style="font-family:monospace,monospace">a = [| 0, 1, 2 |]</span><br><br>Compared to the current approach:<br><br><span style="font-family:monospace,monospace">a = np.ndarray([0, 1, 2])</span><br><br>It isn't much simpler (although it is considerably short).  However, this new syntax becomes much clearer （in my opinion) when dealing with higher number of dimensions (more on that at the end).<br><br>For a 2D array, you would use two vertical bars as a dimension separator "||" (multiple vertical bars are also not valid python syntax):<br><br><span style="font-family:monospace,monospace">a = [| 0, 1, 2 || 3, 4, 5 |]</span><br><br>Or, on multiple lines (whitespace is ignored):<br><span style="font-family:monospace,monospace"><br>a = [| 0, 1, 2 ||<br>       3, 4, 5 |]</span><br><span style="font-family:monospace,monospace"><br></span></div><div><span style="font-family:monospace,monospace">b = [| 0, 1, 2 |<br></span></div><div><span style="font-family:monospace,monospace">     | 3, 4, 5 |]</span><br><br>You can also create a 2D row array by combining the two:<br><br><span style="font-family:monospace,monospace">a = [|| 0, 1, 2 ||]</span><br><br>For higher dimensions, you can just put more lines together:<br><br></div><div><span style="font-family:monospace,monospace">a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||]</span><br></div><div><br></div><div><span style="font-family:monospace,monospace">b = [||| 0, 1, 2<br></span></div><div><span style="font-family:monospace,monospace">      || 3, 4, 5<br>     ||| 6, 7, 8<br>      || 9, 10, 11<br>     |||]</span><span style="font-family:monospace,monospace"></span><br><div><br></div><div><span style="font-family:monospace,monospace">c = [||| 0, 1, 2 |<br></span></div><div><span style="font-family:monospace,monospace">       | 3, 4, 5 |<br>     |<br>       | 6, 7, 8 |<br>       | 9, 10, 11 |||]</span></div><br><br>A 3D row vector would just be:<br><br><span style="font-family:monospace,monospace">a = [||| 0, 1, 2 |||]<br><br></span>A 3d column vector would be:<br><span style="font-family:monospace,monospace"><br><br><span style="font-family:monospace,monospace">a = [||| 0 || 1 || 2 |||]</span><br><br></span></div><div><span style="font-family:monospace,monospace">b = </span><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace">[||| 0 <br>      || 1 <br>      || 2 <br>     |||]<br><br></span></span>A 3D depth vector would be:<span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace"></span></span><span style="font-family:monospace,monospace"><br><br><span style="font-family:monospace,monospace">a = [||| 0 ||| 1 ||| 2 |||]</span><br><br></span><span style="font-family:monospace,monospace">b = </span><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace">[||| 0 <br>     ||| 1 <br>     ||| 2 <br>     |||]</span></span><br></div><div><br><br>The rule for the number of dimensions is just the highest-specified dimension.  So these are equivalent:<br><span style="font-family:monospace,monospace"><br>a = [| 0, 1, 2 ||<br>       3, 4, 5 |]<br><br>b = [|| 0, 1, 2 ||<br>        3, 4, 5 ||]<br></span><br>This also means you would only strictly need to set the dimensions at one end.  That means these are equivalent, although the second and third case should be discouraged:<br><br><span style="font-family:monospace,monospace">a = [|| 0, 1, 2 ||]<br><br>b = [| 0, 1, 2 ||]<br><br>c = [|| 0, 1, 2 |]</span><br><br>As I said earlier, whitespace would not be significant.  These would all be equivalent, but the fourth and fifth approaches would be discouraged as unclear.  I would also discourage the third approach, since I think the whitespace at the beginning and end is important to avoid confusing, for example "[|2" with "[12".  <br><br><span style="font-family:monospace,monospace">a = [| 0, 1 || 2, 3 |]<br><br>b = [| 0, 1 |<br>     | 2, 3 |]<br><br>c = [|0, 1||2, 3|]<br><br>d = [| 0, 1 |       | 2, 3 |]<br><br>e = [  |0,1|       |2,3|   ]</span><br><br>At least in my opinion, this sort of approach really shines when making higher-dimensional arrays.  These would all be equivalent (the | at the beginning and end are just to make it easier to align indentation, they aren't required):<br><br><span style="font-family:monospace,monospace">a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134<br>       || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7<br>      ||| -103, 50, -89, -12,  28, -12, 119, -131, -73, 21<br>       || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56<br>     |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145<br>       || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15<br>      ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60<br>       || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133<br>     ||||]<br><br>b = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 |<br>        | -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 |<br>       |<br>        | -103, 50, -89, -12,  28, -12, 119, -131, -73, 21 |<br>        | -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 |<br>      ||<br>        | -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 |<br>        | 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 |<br>       |<br>        | 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 |<br>        | -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||]</span><br><br><br>Compared to the current approach:<br><br><span style="font-family:monospace,monospace">a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134],<br>                  [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]],<br>                 [[-103, 50, -89, -12,  28, -12, 119, -131, -73, 21],<br>                  [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]],<br>                [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145],<br>                  [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]],<br>                 [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60],<br>                  [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]])<br><br></span></div><div>I think both of the new examples are considerably clearer than the current approach.<span style="font-family:monospace,monospace"><br></span></div><div><span style="font-family:monospace,monospace"><br></span></div>Does anyone have any questions or thoughts?<span style="font-family:monospace,monospace"><br></span></div>