[Python-ideas] Python multi-dimensional array constructor

Wed Oct 19 15:08:21 EDT 2016

I have been thinking about how to go about having a multidimensional array
constructor in python.  I know that Python doesn't have a built-in
multidimensional array class and won't for the foreseeable future.
However, some projects have come up with their own ways of making it
simpler to create such arrays compared to the current somewhat verbose
approach, and it might even be possible (although I think highly unlikely)
for Python to provide a hook for third-party libraries to tie into the sort
of syntax here.  So I felt it might be worthwhile to get my thoughts on the
topic in a central location for future use.

If this sort of thing doesn't interest you I won't be offended if you stop
reading now, and I apologize if it is considered off-topic for this ML.

The problem is finding an operator that isn't already being used, wouldn't
conflict with existing rules, wouldn't break existing code, but that would
still be at clearer and and more concise than the current syntax.

The notation I came up with uses "[|" and "|]".  I picked this for 4
reasons.  First, it isn't currently valid python syntax.  Second, it is
clearly connected with the list constructor "[ ]".  Third, it is
reminiscent of the "⟦ ⟧" symbols used for matrices in mathematics.  Fourth,
"{| |}" and "(| |)" could be used for similar data structures (such as "{|
|}" for labeled arrays like in pandas).

Here is an example of how it would be used for a 1D array:

a = [| 0, 1, 2 |]

Compared to the current approach:

a = np.ndarray([0, 1, 2])

It isn't much simpler (although it is considerably short).  However, this
new syntax becomes much clearer （in my opinion) when dealing with higher
number of dimensions (more on that at the end).

For a 2D array, you would use two vertical bars as a dimension separator
"||" (multiple vertical bars are also not valid python syntax):

a = [| 0, 1, 2 || 3, 4, 5 |]

Or, on multiple lines (whitespace is ignored):

a = [| 0, 1, 2 ||
       3, 4, 5 |]

b = [| 0, 1, 2 |
     | 3, 4, 5 |]

You can also create a 2D row array by combining the two:

a = [|| 0, 1, 2 ||]

For higher dimensions, you can just put more lines together:

a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||]

b = [||| 0, 1, 2
      || 3, 4, 5
     ||| 6, 7, 8
      || 9, 10, 11
     |||]

c = [||| 0, 1, 2 |
       | 3, 4, 5 |
     |
       | 6, 7, 8 |
       | 9, 10, 11 |||]

A 3D row vector would just be:

a = [||| 0, 1, 2 |||]

A 3d column vector would be:

a = [||| 0 || 1 || 2 |||]

b = [||| 0
      || 1
      || 2
     |||]

A 3D depth vector would be:

a = [||| 0 ||| 1 ||| 2 |||]

b = [||| 0
     ||| 1
     ||| 2
     |||]

The rule for the number of dimensions is just the highest-specified
dimension.  So these are equivalent:

a = [| 0, 1, 2 ||
       3, 4, 5 |]

b = [|| 0, 1, 2 ||
        3, 4, 5 ||]

This also means you would only strictly need to set the dimensions at one
end.  That means these are equivalent, although the second and third case
should be discouraged:

a = [|| 0, 1, 2 ||]

b = [| 0, 1, 2 ||]

c = [|| 0, 1, 2 |]

As I said earlier, whitespace would not be significant.  These would all be
equivalent, but the fourth and fifth approaches would be discouraged as
unclear.  I would also discourage the third approach, since I think the
whitespace at the beginning and end is important to avoid confusing, for
example "[|2" with "[12".

a = [| 0, 1 || 2, 3 |]

b = [| 0, 1 |
     | 2, 3 |]

c = [|0, 1||2, 3|]

d = [| 0, 1 |       | 2, 3 |]

e = [  |0,1|       |2,3|   ]

At least in my opinion, this sort of approach really shines when making
higher-dimensional arrays.  These would all be equivalent (the | at the
beginning and end are just to make it easier to align indentation, they
aren't required):

a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134
       || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7
      ||| -103, 50, -89, -12,  28, -12, 119, -131, -73, 21
       || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56
     |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145
       || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15
      ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60
       || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133
     ||||]

b = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 |
        | -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 |
       |
        | -103, 50, -89, -12,  28, -12, 119, -131, -73, 21 |
        | -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 |
      ||
        | -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 |
        | 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 |
       |
        | 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 |
        | -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||]

Compared to the current approach:

a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134],
                  [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]],
                 [[-103, 50, -89, -12,  28, -12, 119, -131, -73, 21],
                  [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]],
                [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145],
                  [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]],
                 [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60],
                  [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]])

I think both of the new examples are considerably clearer than the current
approach.

Does anyone have any questions or thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161019/a8341e38/attachment-0001.html>