Python multi-dimensional array constructor

I have been thinking about how to go about having a multidimensional array constructor in python. I know that Python doesn't have a built-in multidimensional array class and won't for the foreseeable future. However, some projects have come up with their own ways of making it simpler to create such arrays compared to the current somewhat verbose approach, and it might even be possible (although I think highly unlikely) for Python to provide a hook for third-party libraries to tie into the sort of syntax here. So I felt it might be worthwhile to get my thoughts on the topic in a central location for future use. If this sort of thing doesn't interest you I won't be offended if you stop reading now, and I apologize if it is considered off-topic for this ML. The problem is finding an operator that isn't already being used, wouldn't conflict with existing rules, wouldn't break existing code, but that would still be at clearer and and more concise than the current syntax. The notation I came up with uses "[|" and "|]". I picked this for 4 reasons. First, it isn't currently valid python syntax. Second, it is clearly connected with the list constructor "[ ]". Third, it is reminiscent of the "⟦ ⟧" symbols used for matrices in mathematics. Fourth, "{| |}" and "(| |)" could be used for similar data structures (such as "{| |}" for labeled arrays like in pandas). Here is an example of how it would be used for a 1D array: a = [| 0, 1, 2 |] Compared to the current approach: a = np.ndarray([0, 1, 2]) It isn't much simpler (although it is considerably short). However, this new syntax becomes much clearer (in my opinion) when dealing with higher number of dimensions (more on that at the end). For a 2D array, you would use two vertical bars as a dimension separator "||" (multiple vertical bars are also not valid python syntax): a = [| 0, 1, 2 || 3, 4, 5 |] Or, on multiple lines (whitespace is ignored): a = [| 0, 1, 2 || 3, 4, 5 |] b = [| 0, 1, 2 | | 3, 4, 5 |] You can also create a 2D row array by combining the two: a = [|| 0, 1, 2 ||] For higher dimensions, you can just put more lines together: a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||] b = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||] c = [||| 0, 1, 2 | | 3, 4, 5 | | | 6, 7, 8 | | 9, 10, 11 |||] A 3D row vector would just be: a = [||| 0, 1, 2 |||] A 3d column vector would be: a = [||| 0 || 1 || 2 |||] b = [||| 0 || 1 || 2 |||] A 3D depth vector would be: a = [||| 0 ||| 1 ||| 2 |||] b = [||| 0 ||| 1 ||| 2 |||] The rule for the number of dimensions is just the highest-specified dimension. So these are equivalent: a = [| 0, 1, 2 || 3, 4, 5 |] b = [|| 0, 1, 2 || 3, 4, 5 ||] This also means you would only strictly need to set the dimensions at one end. That means these are equivalent, although the second and third case should be discouraged: a = [|| 0, 1, 2 ||] b = [| 0, 1, 2 ||] c = [|| 0, 1, 2 |] As I said earlier, whitespace would not be significant. These would all be equivalent, but the fourth and fifth approaches would be discouraged as unclear. I would also discourage the third approach, since I think the whitespace at the beginning and end is important to avoid confusing, for example "[|2" with "[12". a = [| 0, 1 || 2, 3 |] b = [| 0, 1 | | 2, 3 |] c = [|0, 1||2, 3|] d = [| 0, 1 | | 2, 3 |] e = [ |0,1| |2,3| ] At least in my opinion, this sort of approach really shines when making higher-dimensional arrays. These would all be equivalent (the | at the beginning and end are just to make it easier to align indentation, they aren't required): a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||] b = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 | | -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 | | | -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 | | -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 | || | -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 | | 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 | | | 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 | | -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 ||||] Compared to the current approach: a = np.ndarray([[[[48, 11, 141, 13, -60, -37, 58, -52, -29, 134], [-6, 96, -66, 137, -59, -147, -118, -104, -123, -7]], [[-103, 50, -89, -12, 28, -12, 119, -131, -73, 21], [-58, 105, 25, -138, -106, -118, -29, -49, -63, -56]]], [[[-43, -34, 101, -115, 41, 121, 3, -117, 101, -145], [100, -128, 76, 128, -113, -90, 52, -91, -72, -15]], [[22, -65, -118, 134, -58, 55, -73, -118, -53, -60], [-85, -136, 83, -66, -35, -117, -71, 115, -56, 133]]]]) I think both of the new examples are considerably clearer than the current approach. Does anyone have any questions or thoughts?

Personally I like the way that numpy does it now better (even for multidimensional arrays). Being able to index into the different sub dimension using just [] iteratively matches naturally with the data structure itself in my mind. This may also just be my fear of change though...
What would the syntax do if you don't have numpy installed? Is the syntax tied to numpy or could other libraries make use of it? Cheers, Thomas

On Wed, Oct 19, 2016 at 3:24 PM, Thomas Nyberg <tomuxiong@gmx.com> wrote:
I agree, that is one of the reasons this is still using "[ ]". You can think of the "|" as more of a dimension delimiter. Also keep in mind that tuples and dicts still use [ ] for indexing even though their constructor doesn't use [ ].
That would depend on the implementation, which is another issue I hesitate to even discuss up because it is a huge can of worms. The most plausible implementation in my mind would be for projects like IPython, Spyder, Sage, or some array-oriented language that compiles to Python to have their own hooks that would replace this sort syntax with "np.ndarray" behind-the-scenes. A less likely scenario that occurred to me would be for the Python interpreter to provide some sort of hook to allow a class to be registered as the handler for this syntax. So someone could register numpy, dask, dynd, or whatever they wanted as the handler. If nothing was registered using the syntax would raise an exception (perhaps NameError or some new exception). With equivalent "(| |)" and "{| |}" syntax you could conceivably register three packages. I figured perhaps "[| |]" would be used for your primary array class (which currently would pretty much always be a ndarray), and there could be a more dict-like "{| |}" syntax that could be used for pandas or xarray, leaving "(| |)" for a more special-purpose library of your choosing. But that would be a convention, it would be entirely up to the user. Behind-the-scenes, this syntax would be converted to nested tuples or lists (or maybe dicts for "{| |}") and passed to the constructor or a special classmethod for the registered class to handle however it sees fit. There are all sorts of questions and corner cases for this hook approach though. Could people change the registered handler after it is set? At what points during script executation would setting the handler be allowed, at any point or only near the beginning? Would "{| |}" use the same syntax or some sort of dict-like syntax? If dict-like, would there be separate dict-like and set-like syntaxes, resulting in four handlers? Or would list-like and dict-like syntaxes be allowed in all cases, and handlers would need to deal with getting lists/tuples or dicts (even if handling was simply raising a TypeError)? Does the hook provide lists or tuples? Does the data get fed directly to the constructor or to a special class method? If the former, how do classes identify themselves as being able to act as handlers, or should users be allowed to register any class? There are so many questions I don't have good answers to I don't feel comfortable proposing this sort of hook approach as something that should actually be implemented.

You could add or prototype this with quasiquotes ( http://quasiquotes.readthedocs.io/en/latest/). You just need to be able to parse the body of your expression as a string into an array. Here is a quick example with a parser that only accepts 2d arrays: ``` # coding: quasiquotes import numpy as np from quasiquotes import QuasiQuoter @object.__new__ class array(QuasiQuoter): def quote_expr(self, expr, frame, col_offset): return np.array([ eval('[%s]' % d, frame.f_globals, frame.f_locals) for d in expr.split('||') ]) def f(): a = 1 b = 2 c = 3 return [$array| a, b, c || 4, 5, 6 |] if __name__ == '__main__': print(f()) ``` Personally I am not sold on replacing `[` and `]` with `|` because I like that you can visually see where dimensions are closed. On Wed, Oct 19, 2016 at 3:24 PM, Thomas Nyberg <tomuxiong@gmx.com> wrote:

On Wed, Oct 19, 2016 at 3:55 PM, Joseph Jevnik <joejev@gmail.com> wrote:
Interesting project, thanks! If there is any actual interest in this that might be a good way to prototype it.
Personally I am not sold on replacing `[` and `]` with `|` because I like that you can visually see where dimensions are closed.
Yes, that issue occurred to me. But assuming a rectangular matrix, I had trouble coming up with a good example that is clearer than what you could do with this syntax. For simple arrays it isn't needed, and complicated arrays are large so picking out the "[" and "]" becomes visually harder at least for me. Do you have a specific example that you think would be clearer than what is possible with this syntax? Of course that is more of an issue with jagged arrays, but numpy doesn't support those and I am not aware of any plans to add them (dynd is another story). Also keep in mind that this would supplement the existing approach, it doesn't replace it. np.ndarray() would stay around just like list() stays around for cases where it makes sense.

FWIW, you probably _don't_ want to use `ndarray` directly. Normally, you want to use the `np.array` factory function...
Aside from that, my main problem with this proposal is that it seems to only be relevant when used in third party code. There _is_ some precedence for this (for example rich comparisons and the matrix multiplication operator) -- However, these are all _operators_ so third party code can hook into it using the provided hook methods. This proposal is different in that it _isn't_ proposing an operator, so there isn't any object on which to define a magic hook method. I think that it was mentioned that it might be possible for a user to _register_ a callable that would then be used when this syntax was envoked -- But having a global setting like that leads to contention. What if I want to use this syntax with `np.ndarray` but some other third party code (that I want to use _with_ numpy_ tries to hook into the syntax as well? All of a sudden, my script stops working as soon as I import a new third party module. I _do_ think that this might be a valid proposal for some of the more domain specific python variants (e.g. IPython) which have a pre-processing layer on top of the rest of the language. It might be worth trying to float this idea in one of their ideas mailing lists/issue trackers. On Wed, Oct 19, 2016 at 1:10 PM, Todd <toddrjen@gmail.com> wrote:
-- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt@getpattern.com // P: 603.892.7736 We’re looking for beta testers. Go here <https://www.getpattern.com/meetpattern> to sign up!

On Wed, Oct 19, 2016 at 4:47 PM, Matt Gilson <matt@getpattern.com> wrote:
Yes, this should definitely not be a default import of a package for exactly that reason, and it should be local to the module in which it was invoked. The most likely way I saw it working was that the user would have to explicitly invoke the hook, rather than it happening by another module on import. It would happen in the module namespace, so it would be impossible for imports to invoke it, and your use of it wouldn't affect the use of it in other modules you import. This seemed to me the approach that is safest, most reliable, and least likely to cause confusion, unexpected behavior, and unexpected breakage down the road. If it happened at import, then having two modules invoke the hook would probably need to be an exception or first-come-first-serve. But I think requiring the user to manually invoke it would be better. But as I said there are a lot of other problems with this approach so I don't consider it particularly likely.
I do see this being the most likely scenario ultimately. I am pretty sure Sage already does its own ndarray handling, and I recall talk about doing it Spyder although I don't know if anything came of it. I will probably bring this up there at some point, but as I said this is the central location for Python ideas, so I thought having it here was important.

Matt Gilson wrote:
I think for that to fly it would have to be a per-module thing. Then each module using the syntax would be able to choose the meaning of it. A simple way to do this would be for the compiler to translate it into something like __array__([[[ ... ]]]) and then you would just define __array__ appropriately, e.g. from numpy import array as __array__ Personally I'm not very enthusiastic about the whole thing, though. I don't find the new syntax to be much of an improvement, if any. Certainly nowhere near enough to be worth adding syntax. -- Greg

I find the proposed syntax worse than the existing square brackets. The way the NumPy does a repr of an array is a good model of clarity, and it's correct current Python (except for larger arrays where visual ellipses are used). On Oct 20, 2016 12:01 AM, "Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote:

On 19 October 2016 at 21:08, Todd <toddrjen@gmail.com> wrote:
My 5 cents here. When I am dealing with such arrays, the only *good* solution which comes to my mind is to find or develop a nice GUI application which will allow me to use all powers of mouse/keyboard for navigation through data and switching between dimensions, and editing them in an effective way. Anything in a text mode editor for this task will be probably pointless, both for editing and reading such arrays. And indeed it is frustrating and error prone at times. Mikhail

a few thoughts: On Wed, Oct 19, 2016 at 12:08 PM, Todd <toddrjen@gmail.com> wrote:
no but it does have buffers and memoryviews and the extended buffer protocol supports "strided" data -- i.e. multi-dimensional arrays. So it would be nice to have SOME simple ndarray object in the standard library that would wrap such buffers -- it would be nice for working with image data, interacting with numpy arrays, etc. The "trick" is that once you have the container, you want some functionality -- so you add indexing and slicing -- natch. Then maybe some simple math? then.... eventually, you are trying to put all of numpy into the stdlib, and we already know we don't want to do that. Though I still think a simple container that only supports indexing and slicing would be lovely. That all being said: a = [| 0, 1, 2 || 3, 4, 5 |]
I really don't see the advantage of that over: a = [[0, 1, 2],[3, 4, 5]] really I don't -- and I'm a heavy numpy user, so I write a lot of those! If there is a problem with the current options (and I'm not convinced there is) it's that it in'st a literal for multidimensional array, but rather a literal for a bunch of nested lists -- the list themselves are created, and so are all the "boxed" values in the array -- only to be pulled out and unboxed to be put in the array. However, this is only for literals -- if your data are large, then they are not going to be in literals, but rather read form a file or something, so this is really not much of a limitation. However, if you really don't like it, then you can pass a string to aconfsturctor function instead: a = arr_from_string(" | 0, 1, 2 || 3, 4, 5 | ") yeah, you need to type the extra quotes, but that's not much. NOTE: I'm pretty sure numpy has something like this already, for folks that like the MATLAB style -- though I can't find it at the moment. b = [| 0, 1, 2 |
| 3, 4, 5 |]
b = [[ 0, 1, 2 ], [ 3, 4, 5 ]] You can also create a 2D row array by combining the two:
a = [|| 0, 1, 2 ||]
a = [[ 0, 1, 2 ]] or is it: [[[ 0, 1, 2 ]]] (I can't tell, so maybe your syntax is not so clear???
I have no idea what that means!
nor these....
it does seem that you are saving some typing when you have high-dim arrays, but I really dont see the readability here.
I think both of the new examples are considerably clearer than the current approach.
not to me :-( but anyway, the way to more this kind of thing forward is to use it as a new format in an existing lib (like numpy, by passing it as a big string. IF folks like it and start using it, then there is room for a conversation. But I doubt (and I wouldn't support) that anyone would put a literal into python for an object that doesn't exist in python... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker <chris.barker@noaa.gov> wrote:
However, if you really don't like it, then you can pass a string to
aconfsturctor function instead:
that like the MATLAB style -- though I can't find it at the moment. You are probably thinking of the numpy.matrix constructor:
See <https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html
.

On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker <chris.barker@noaa.gov> wrote:
But as you said, that is not a multidimensional array. We aren't comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = [[0, 1, 2],[3, 4, 5]]", we are comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = np.array([[0, 1, 2],[3, 4, 5]])". That is a bigger difference.
Even if your original data is large, I often need smaller areas when processing, for example for broadcasting or as arguments to processing functions.
Then you need an even longer function call. Again, that defeats the purpose of having a literal, which is to make the syntax more concise.
NOTE: I'm pretty sure numpy has something like this already, for folks that like the MATLAB style -- though I can't find it at the moment.
It is: r_[[0, 1, 2], [3, 4, 5] But this uses indexing behind the scenes, meaning your data is created as an index then needs to be converted to a list later. This adds considerable overhead. I just tested it and it was somewhere around 20 times slower than "np.array()" in the test.
No, this is the equivalent of: b = np.array([[ 0, 1, 2 ], [ 3, 4, 5 ]]) The whole point of this is to avoid the "np.array" call.
I am not clear where the ambiguity lies? Count the number of "|" symbols.
||| is the delimiter for the third dimension, || is the delimiter for the second dimension. It is like how newline is used as a delimeter for the second dimension in CSV files. So it is equivalent to: b = np.array([[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]])
If you are used to counting braces, perhaps. But imagine someone who is just starting out. How do you describe how to determine what dimension is being split? "It is one more than total number of sequential left braces and left parentheses" vs “it is the number of vertical lines". Add to that having to deal with both left and right braces rather than a single delimiter adds a lot of visual noise. There is a reason we use commas rather than, say ">,<" as a delimiter in lists, it is easier to deal with a single kind of symbol rather than three (or potentially five in the current case).
The big problem with that is that having to wrap it as a string and pass it to a function in the numpy namespace loses much of the advantage from having a literal to begin with.
But I doubt (and I wouldn't support) that anyone would put a literal into python for an object that doesn't exist in python...
Yes, I understand that. But some projects are already doing that on their own. I think having a way for them to do it without losing the list constructor (which is the approach currently being taken) would be a benefit.

Todd wrote:
||| is the delimiter for the third dimension, || is the delimiter for the second dimension.
This seems a bit inconsistent. It appears the rule is "n vertical bars is the delimiter for the nth dimension". By that rule, the delimiter for the first dimension should be a single vertical bar, but instead it's a comma. Also, it's not very clear why when you have a 2D array with two rows you write [| 1,2,3 || 4,5,6 |] i.e. with *one* vertical bar at each end, but when there is only one row you write [|| 1,2,3 ||] i.e. with *two* vertical bars at each end. -- Greg

On Wed, Oct 19, 2016 at 5:32 PM, Todd <toddrjen@gmail.com> wrote:
Well then, you have mixed two proposals here: 1) a literal syntax for nd arrays -- that is not going to fly if there is NO ndarray object builtin to python. I kinda think there should be, though even then there need not be a literal for it (see Decimal). So I'd say -- get an nd array object into the standard library first, then we can talk about the literal 2) what the syntax should be for such a literal. OK, in this case, suggested that the way to hash that out is to start out with passing a string to a function that constructs the array -- then you could try things out without any additions to the language or the libraries -- it could be a stand-alone module that extends numpy: from ndarray_literal import nda my array = nda('||| 3, 4, 5 || 6, 7, 8 |||') (is that legal in your notation -- I honestly am not sure) and yes, that still requires typing "nda('", which you are trying to avoid. But honestly, I really have written a lot of numpy code, and writing: np.array( ..... ) does not bother me at all. IF I did support a literal, it would be so that the object could be constructed immediately rather than by creating other python objects first (liss, usually), and then making an array from that. If you do want to push the syntax idea further, I'd suggest going to the numpy list and seeing what folks there think. But as I can't help myself. It's clear from the other posts on the list here that others find your proposed syntax as confusing as I do. but maybe it could be made more clear. Taking a page from MATLAB: 1 2 3; 4 5 6 is a 2x3 2-d array. no in MATLAB, there only used to be matrixes, so this was pretty nice, but a bit hard to extend to multiple dimensions. But the principle is handy: one delimter for the first dimension,l a nother one for the second, etc.. we probably dont want to go with trying colons, and ! and who knows what else, so I liek your idea. a 2-d array: 1 | 2 | 3 || 4 | 5 | 6 (Or better) 1 | 2 | 3 || 4 | 5 | 6 a 3d array: 0 | 1 | 2 | 3 || 4 | 5 | 6 | 7 || 8 | 9 | 10 | 11 ||| 12 | 13 | 14 | 15|| 16 | 17 | 18 | 19|| 20 | 21 | 22 | 23|| Notes: 1) guess how I wrote that? I did: np.arange(24).reshape((2,3,4)) and edited the results -- making the point that maybe the current state of affairs is not so bad... 2) These are delimiters, rather than brackets -- so don't go at the beginning and optional at the end (like commas in python) 3) It doesn't use commas as all, as having a consistent system is clearer 4) Whitespace is insignificant as with the rest of Python -- though you want to be able to use a line ending as insignificant whitespace, so this may have to wrapped in a parentheses, or something to actually use it -- though a non-issue if it's a string Hmm -- about point (3), maybe use only commas: 0, 1, 2, 3,, 4, 5, 6, 7,, 8, 9, 10, 11,,, 12, 13, 14, 15,, 16, 17, 18, 19,, 20, 21, 22, 23 That would be more consistent with the rest of python, and multiple commas in a row are currently a syntax error. Even if your original data is large, I often need smaller areas when
processing, for example for broadcasting or as arguments to processing functions.
sure I do hard-coded arrays all teh time -- but not big ones, and I don't think I've ever needed more than 2D and certainly not more than 3D. and not large enough that performance matters. It is:
r_[[0, 1, 2], [3, 4, 5]
no, that's a shorthand for "row stack" -- and really not much better than the array() call, except a few less characters I meant the np.matrix() function that Alexander pointed out -- which is only really there to make folks coming from MATLAB happier...(and it makes a Matrix object, which you likely don't want). The point was that it's easy to make such a beast for your new syntax to try it out b = np.array([[ 0, 1, 2 ],
[ 3, 4, 5 ]])
The whole point of this is to avoid the "np.array" call.
again, trying to separate out the idea of a literal, from the syntax of the literal. but thinking now, Python already uses (), [], {}, and < and > -- so I don't think there are any more brackets. but why not just use commas with square brackets: 2Darr = [1, 2, 3,, 4, 5, 6] maybe too subtle? Yes, I understand that. But some projects are already doing that on their
huh? is anyone actually overriding the list constructor?? multiple dims apart (my [ and ,, example shows that you can do that with the current syntax) this is kind of like adding Decimal -- there is another type, but does it need a literal? I have maybe 90% of the code I write with an: import numpy as np at the top -- so yes, I kind a would like a literal, but it's really a pretty small deal -- at least once I got used to it after using MATLAB for years. I'd ask folks that have been using numpy for along time -- would this really help? One more problem -- with the addition of the @ operator, there have not been any use cases in the stdlib, but it is an operator, and Python already has a mechanism for operator overloading. As far as I know, every python literal maps to a SINGLE type -- so creating a literal for a non existent type makes no sense at all. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Oct 19, 2016 at 03:08:21PM -0400, Todd wrote: [taking your later comment out of the order it was written]
If this sort of thing doesn't interest you I won't be offended if you stop reading now, and I apologize if it is considered off-topic for this ML.
No problem Todd, we shouldn't be offended by ideas, and this is definitely on-topic.
Generally speaking, Python doesn't invent syntax just on the off-chance that it will come in handy, nor does it typically invent operators for third-party libraries to use if they have no use in the built-ins. I'm only aware of two exceptions to this, and both were added for numpy: extended slicing seq[start:end:step] and matrix multiplication A @ B. Extended slicing now is used by the built-ins, but originally it was added specifically for numpy. However, in both cases, the suggestion came from the numpy developers themselves, and they had a specific, concrete need for the feature. Both features were solutions to real problems found by numpy users. I wasn't around when extended slicing was added, but matrix multiplication is an excellent example of a well-researched, well-written PEP: http://python.org/dev/peps/pep-0465/ Whereas your suggestion seems more like a solution in search of a problem. You've come up with syntax for building arrays, but you don't seem to know which, if any, array will use this; nor do you seem to have identified an actual problem with the existing solution used by numpy (apart from calling them "somewhat verbose").
Just a brief note on terminology: you're not describing an operator, you're describing a "display" syntax: delimiters used to build a type such as tuple, list or dict. I still think of them as "list literals" etc, [1, 2, 3, 4] for example, even though technically they are not necessary literals (i.e. known at compile-time) and officially they are called "list displays" etc.
Sometimes used for matrices. Its more common to use a multiple-line version of [ ] which is, of course, hard to type in a regular editor :-) See examples of matricies here: http://mathworld.wolfram.com/Matrix.html Moving on to the multi-dimensional examples you give:
To me, that looks decidedly strange. The | symbol has the disadvantage that you cannot tell which is opening a row and which is closing a row. The above looks like: - first row: opened with a single bar, closed with two bars; - second row: no opening delimiter at all, closed with a single bar. I think that you have to compete with existing syntax for nested lists. The lowest common denominator for any array is to use nested lists and a function call. Nested lists can be easily converted into *any* array type you like, rather than picking out one, and only one, array type for special treatment. If Python had a built-in array type, then maybe this would be justified, but it doesn't, and isn't likely to get one: lists fill the role that arrays do in most other languages. There is an array type in the standard library, array.array, but its not built-in and not important enough to be built-in or to get special syntax of its own. And I'm not sure that numpy users miss the ability to write multi-dimensional arrays using syntax instead of a function call. Normally they would want the ability to specify a type and an order (rows first, like C, or columns first, like Fortran), and I think that for multi-dimensional arrays it is more usual and simpler to write out the values in a linear array and tell the array constructor to re-arrange them. Trying to write out a visual representation of anything with more than two dimensions is cumbersome when you are limited to the flat plan of a text file. Consider: [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] If your editor can highlight matching brackets, its quite easy to see where each row and plane begins and ends. Whereas your suggested syntax looks to me like a whole bunch of confusing lines. I cannot even work out what are the dimensions of this example:
although if I sit and stare at it for a while I might guess... 4*3? If I already know it is meant to be 3D, then I might be able to work out that the extra bar means something, and guess 2*3*2, but I really wouldn't want to bet my sanity on understanding what those lines mean. (Especially since, later on, the exact number and placement of lines is optional.) What's the rule for when to use triple bars ||| and when to use double bars || or a single bar | ? It's a mystery to me. At least with matching left and right delimiters [ ] I can match them up to see where they begin and end.
Okay, now I'm completely lost. Doesn't the first example with a single vertical bar | mean that it is a 1D array? What's the "highest-specified dimension"? Are you suggesting that we have to count vertical bars to work out the dimension?
This strikes me as a HUGE bug magnet. More like a bug black hole actually, sucking in bugs from all through the universe and inserting them into your arrays... *wink* Effectively, what you are saying is that *as an intentional feature*, a stray | accidentally inserted into your array will not cause a syntax error, but will instead increase the number of dimensions of the array. So instead of having a 17*10*30 array as you expected, you have a 1*17*10*30 or 17*10*30*1 array, which may or may not fail deep in your code with some more or less unexpected and hard to diagnose error. This (anti-)feature also makes syntax highlighting of matching bars impossible, instead of merely fiendishly difficult. Since it isn't an error for the bars not to match, you can't even count the bars to work out which ones are supposed to match. You have to somehow intuit or guess what the dimensions of the array are supposed to be, then reason backwards to see whether the right number of bars are in the right places to be compatible with those dimensions, and if not, your guess of the dimensions might be wrong... or not.
At least in my opinion, this sort of approach really shines when making higher-dimensional arrays.
You should compare your approach to that of mathematicians and other programming languages. Mathematicians don't really use multi-dimensional arrays. They have vectors, which are 1D, and matrices which are 2D, then they have tensors which confuse me, but they don't seem to use anything which corresponds to a simple higher-dimension analog of matrices. Tensors come close, but they don't seem to have anything like matrix-notation for tensors. (Given that tensors are often infinite dimensional, I'm hardly surprised.) Matlab has syntax for 2D arrays, which can be expanded into 3D: A = [1 2; 3 4]; A(:,:,2) = [5 6; 7 8] R has an array function:
array(1:8, c(2,2,2)) , , 1
[,1] [,2] [1,] 1 3 [2,] 2 4 , , 2 [,1] [,2] [1,] 5 7 [2,] 6 8 Differences in ordering (row first or column first) aside, they are equivalent to Python's: [[[1, 2], [3, 4]], [[5, 6], [7, 8]], ] My HP-48 calculator uses square brackets for matrixes, with the convenience that in the calculator interface I only need to close the first pair of brackets: 2D: I can enter the keystrokes: [[1 2] 3 4 to get the 2D matrix: [[ 1 2 ] [ 3 4 ]] but it has no support for 3D arrays. Here's how C# does it: https://msdn.microsoft.com/en-us/library/2yd9wwz4.aspx
I wouldn't even want to guess what dimensions that is supposed to be. 10 columns, because I can count them, but everything else is a mystery.
But that's easy! Look at the nested brackets. The opening sequence tells you that there are four dimensions: [[[[ I can count the ten columns (and if I align them, I can visually verify that each row has exactly ten columns). Looking at the nested lists, I see: [[[[ten columns], [ten columns]], so that's two rows by ten, then continuing: [2 x 10]], which closes another layer, so that's 2 items in the third dimension, then when have another dimension: [2 x 10 x 2]] and the array is closed, giving us in total: 2 x 10 x 2 x 2 In my opinion anyone trying to write out a single 4D array like this is opening themselves up to a hiding for nothing, even with clear nesting and matching open/close delimiters. Since we don't have 4D text files, it's better to write: L = [48, 11, 141, 13, -60, -37, 58, -52, -29, 134, -6, 96, -66, 137, -59, -147, -118, -104, -123, -7, -103, 50, -89, -12, 28, -12, 119, -131, -73, 21, -58, 105, 25, -138, -106, -118, -29, -49, -63, -56, -43, -34, 101, -115, 41, 121, 3, -117, 101, -145, 100, -128, 76, 128, -113, -90, 52, -91, -72, -15, 22, -65, -118, 134, -58, 55, -73, -118, -53, -60, -85, -136, 83, -66, -35, -117, -71, 115, -56, 133] assert len(L) == 2*10*2*2 arr = array(L, dim=(2,10,2,2)) or something similar, and let the array constructor resize as needed. -- Steve

On 10/19/2016 12:08 PM, Todd wrote:
Optional, semi-meaningless, not-really-an-operator markings? The current approach I could at least figure out if I had to -- yours is confusing. You have done a good job explaining what you mean, but what to you is clear is to me, and others, incomprehensible. -- ~Ethan~

Personally I like the way that numpy does it now better (even for multidimensional arrays). Being able to index into the different sub dimension using just [] iteratively matches naturally with the data structure itself in my mind. This may also just be my fear of change though...
What would the syntax do if you don't have numpy installed? Is the syntax tied to numpy or could other libraries make use of it? Cheers, Thomas

On Wed, Oct 19, 2016 at 3:24 PM, Thomas Nyberg <tomuxiong@gmx.com> wrote:
I agree, that is one of the reasons this is still using "[ ]". You can think of the "|" as more of a dimension delimiter. Also keep in mind that tuples and dicts still use [ ] for indexing even though their constructor doesn't use [ ].
That would depend on the implementation, which is another issue I hesitate to even discuss up because it is a huge can of worms. The most plausible implementation in my mind would be for projects like IPython, Spyder, Sage, or some array-oriented language that compiles to Python to have their own hooks that would replace this sort syntax with "np.ndarray" behind-the-scenes. A less likely scenario that occurred to me would be for the Python interpreter to provide some sort of hook to allow a class to be registered as the handler for this syntax. So someone could register numpy, dask, dynd, or whatever they wanted as the handler. If nothing was registered using the syntax would raise an exception (perhaps NameError or some new exception). With equivalent "(| |)" and "{| |}" syntax you could conceivably register three packages. I figured perhaps "[| |]" would be used for your primary array class (which currently would pretty much always be a ndarray), and there could be a more dict-like "{| |}" syntax that could be used for pandas or xarray, leaving "(| |)" for a more special-purpose library of your choosing. But that would be a convention, it would be entirely up to the user. Behind-the-scenes, this syntax would be converted to nested tuples or lists (or maybe dicts for "{| |}") and passed to the constructor or a special classmethod for the registered class to handle however it sees fit. There are all sorts of questions and corner cases for this hook approach though. Could people change the registered handler after it is set? At what points during script executation would setting the handler be allowed, at any point or only near the beginning? Would "{| |}" use the same syntax or some sort of dict-like syntax? If dict-like, would there be separate dict-like and set-like syntaxes, resulting in four handlers? Or would list-like and dict-like syntaxes be allowed in all cases, and handlers would need to deal with getting lists/tuples or dicts (even if handling was simply raising a TypeError)? Does the hook provide lists or tuples? Does the data get fed directly to the constructor or to a special class method? If the former, how do classes identify themselves as being able to act as handlers, or should users be allowed to register any class? There are so many questions I don't have good answers to I don't feel comfortable proposing this sort of hook approach as something that should actually be implemented.

You could add or prototype this with quasiquotes ( http://quasiquotes.readthedocs.io/en/latest/). You just need to be able to parse the body of your expression as a string into an array. Here is a quick example with a parser that only accepts 2d arrays: ``` # coding: quasiquotes import numpy as np from quasiquotes import QuasiQuoter @object.__new__ class array(QuasiQuoter): def quote_expr(self, expr, frame, col_offset): return np.array([ eval('[%s]' % d, frame.f_globals, frame.f_locals) for d in expr.split('||') ]) def f(): a = 1 b = 2 c = 3 return [$array| a, b, c || 4, 5, 6 |] if __name__ == '__main__': print(f()) ``` Personally I am not sold on replacing `[` and `]` with `|` because I like that you can visually see where dimensions are closed. On Wed, Oct 19, 2016 at 3:24 PM, Thomas Nyberg <tomuxiong@gmx.com> wrote:

On Wed, Oct 19, 2016 at 3:55 PM, Joseph Jevnik <joejev@gmail.com> wrote:
Interesting project, thanks! If there is any actual interest in this that might be a good way to prototype it.
Personally I am not sold on replacing `[` and `]` with `|` because I like that you can visually see where dimensions are closed.
Yes, that issue occurred to me. But assuming a rectangular matrix, I had trouble coming up with a good example that is clearer than what you could do with this syntax. For simple arrays it isn't needed, and complicated arrays are large so picking out the "[" and "]" becomes visually harder at least for me. Do you have a specific example that you think would be clearer than what is possible with this syntax? Of course that is more of an issue with jagged arrays, but numpy doesn't support those and I am not aware of any plans to add them (dynd is another story). Also keep in mind that this would supplement the existing approach, it doesn't replace it. np.ndarray() would stay around just like list() stays around for cases where it makes sense.

FWIW, you probably _don't_ want to use `ndarray` directly. Normally, you want to use the `np.array` factory function...
Aside from that, my main problem with this proposal is that it seems to only be relevant when used in third party code. There _is_ some precedence for this (for example rich comparisons and the matrix multiplication operator) -- However, these are all _operators_ so third party code can hook into it using the provided hook methods. This proposal is different in that it _isn't_ proposing an operator, so there isn't any object on which to define a magic hook method. I think that it was mentioned that it might be possible for a user to _register_ a callable that would then be used when this syntax was envoked -- But having a global setting like that leads to contention. What if I want to use this syntax with `np.ndarray` but some other third party code (that I want to use _with_ numpy_ tries to hook into the syntax as well? All of a sudden, my script stops working as soon as I import a new third party module. I _do_ think that this might be a valid proposal for some of the more domain specific python variants (e.g. IPython) which have a pre-processing layer on top of the rest of the language. It might be worth trying to float this idea in one of their ideas mailing lists/issue trackers. On Wed, Oct 19, 2016 at 1:10 PM, Todd <toddrjen@gmail.com> wrote:
-- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt@getpattern.com // P: 603.892.7736 We’re looking for beta testers. Go here <https://www.getpattern.com/meetpattern> to sign up!

On Wed, Oct 19, 2016 at 4:47 PM, Matt Gilson <matt@getpattern.com> wrote:
Yes, this should definitely not be a default import of a package for exactly that reason, and it should be local to the module in which it was invoked. The most likely way I saw it working was that the user would have to explicitly invoke the hook, rather than it happening by another module on import. It would happen in the module namespace, so it would be impossible for imports to invoke it, and your use of it wouldn't affect the use of it in other modules you import. This seemed to me the approach that is safest, most reliable, and least likely to cause confusion, unexpected behavior, and unexpected breakage down the road. If it happened at import, then having two modules invoke the hook would probably need to be an exception or first-come-first-serve. But I think requiring the user to manually invoke it would be better. But as I said there are a lot of other problems with this approach so I don't consider it particularly likely.
I do see this being the most likely scenario ultimately. I am pretty sure Sage already does its own ndarray handling, and I recall talk about doing it Spyder although I don't know if anything came of it. I will probably bring this up there at some point, but as I said this is the central location for Python ideas, so I thought having it here was important.

Matt Gilson wrote:
I think for that to fly it would have to be a per-module thing. Then each module using the syntax would be able to choose the meaning of it. A simple way to do this would be for the compiler to translate it into something like __array__([[[ ... ]]]) and then you would just define __array__ appropriately, e.g. from numpy import array as __array__ Personally I'm not very enthusiastic about the whole thing, though. I don't find the new syntax to be much of an improvement, if any. Certainly nowhere near enough to be worth adding syntax. -- Greg

I find the proposed syntax worse than the existing square brackets. The way the NumPy does a repr of an array is a good model of clarity, and it's correct current Python (except for larger arrays where visual ellipses are used). On Oct 20, 2016 12:01 AM, "Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote:

On 19 October 2016 at 21:08, Todd <toddrjen@gmail.com> wrote:
My 5 cents here. When I am dealing with such arrays, the only *good* solution which comes to my mind is to find or develop a nice GUI application which will allow me to use all powers of mouse/keyboard for navigation through data and switching between dimensions, and editing them in an effective way. Anything in a text mode editor for this task will be probably pointless, both for editing and reading such arrays. And indeed it is frustrating and error prone at times. Mikhail

a few thoughts: On Wed, Oct 19, 2016 at 12:08 PM, Todd <toddrjen@gmail.com> wrote:
no but it does have buffers and memoryviews and the extended buffer protocol supports "strided" data -- i.e. multi-dimensional arrays. So it would be nice to have SOME simple ndarray object in the standard library that would wrap such buffers -- it would be nice for working with image data, interacting with numpy arrays, etc. The "trick" is that once you have the container, you want some functionality -- so you add indexing and slicing -- natch. Then maybe some simple math? then.... eventually, you are trying to put all of numpy into the stdlib, and we already know we don't want to do that. Though I still think a simple container that only supports indexing and slicing would be lovely. That all being said: a = [| 0, 1, 2 || 3, 4, 5 |]
I really don't see the advantage of that over: a = [[0, 1, 2],[3, 4, 5]] really I don't -- and I'm a heavy numpy user, so I write a lot of those! If there is a problem with the current options (and I'm not convinced there is) it's that it in'st a literal for multidimensional array, but rather a literal for a bunch of nested lists -- the list themselves are created, and so are all the "boxed" values in the array -- only to be pulled out and unboxed to be put in the array. However, this is only for literals -- if your data are large, then they are not going to be in literals, but rather read form a file or something, so this is really not much of a limitation. However, if you really don't like it, then you can pass a string to aconfsturctor function instead: a = arr_from_string(" | 0, 1, 2 || 3, 4, 5 | ") yeah, you need to type the extra quotes, but that's not much. NOTE: I'm pretty sure numpy has something like this already, for folks that like the MATLAB style -- though I can't find it at the moment. b = [| 0, 1, 2 |
| 3, 4, 5 |]
b = [[ 0, 1, 2 ], [ 3, 4, 5 ]] You can also create a 2D row array by combining the two:
a = [|| 0, 1, 2 ||]
a = [[ 0, 1, 2 ]] or is it: [[[ 0, 1, 2 ]]] (I can't tell, so maybe your syntax is not so clear???
I have no idea what that means!
nor these....
it does seem that you are saving some typing when you have high-dim arrays, but I really dont see the readability here.
I think both of the new examples are considerably clearer than the current approach.
not to me :-( but anyway, the way to more this kind of thing forward is to use it as a new format in an existing lib (like numpy, by passing it as a big string. IF folks like it and start using it, then there is room for a conversation. But I doubt (and I wouldn't support) that anyone would put a literal into python for an object that doesn't exist in python... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker <chris.barker@noaa.gov> wrote:
However, if you really don't like it, then you can pass a string to
aconfsturctor function instead:
that like the MATLAB style -- though I can't find it at the moment. You are probably thinking of the numpy.matrix constructor:
See <https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html
.

On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker <chris.barker@noaa.gov> wrote:
But as you said, that is not a multidimensional array. We aren't comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = [[0, 1, 2],[3, 4, 5]]", we are comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = np.array([[0, 1, 2],[3, 4, 5]])". That is a bigger difference.
Even if your original data is large, I often need smaller areas when processing, for example for broadcasting or as arguments to processing functions.
Then you need an even longer function call. Again, that defeats the purpose of having a literal, which is to make the syntax more concise.
NOTE: I'm pretty sure numpy has something like this already, for folks that like the MATLAB style -- though I can't find it at the moment.
It is: r_[[0, 1, 2], [3, 4, 5] But this uses indexing behind the scenes, meaning your data is created as an index then needs to be converted to a list later. This adds considerable overhead. I just tested it and it was somewhere around 20 times slower than "np.array()" in the test.
No, this is the equivalent of: b = np.array([[ 0, 1, 2 ], [ 3, 4, 5 ]]) The whole point of this is to avoid the "np.array" call.
I am not clear where the ambiguity lies? Count the number of "|" symbols.
||| is the delimiter for the third dimension, || is the delimiter for the second dimension. It is like how newline is used as a delimeter for the second dimension in CSV files. So it is equivalent to: b = np.array([[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]])
If you are used to counting braces, perhaps. But imagine someone who is just starting out. How do you describe how to determine what dimension is being split? "It is one more than total number of sequential left braces and left parentheses" vs “it is the number of vertical lines". Add to that having to deal with both left and right braces rather than a single delimiter adds a lot of visual noise. There is a reason we use commas rather than, say ">,<" as a delimiter in lists, it is easier to deal with a single kind of symbol rather than three (or potentially five in the current case).
The big problem with that is that having to wrap it as a string and pass it to a function in the numpy namespace loses much of the advantage from having a literal to begin with.
But I doubt (and I wouldn't support) that anyone would put a literal into python for an object that doesn't exist in python...
Yes, I understand that. But some projects are already doing that on their own. I think having a way for them to do it without losing the list constructor (which is the approach currently being taken) would be a benefit.

Todd wrote:
||| is the delimiter for the third dimension, || is the delimiter for the second dimension.
This seems a bit inconsistent. It appears the rule is "n vertical bars is the delimiter for the nth dimension". By that rule, the delimiter for the first dimension should be a single vertical bar, but instead it's a comma. Also, it's not very clear why when you have a 2D array with two rows you write [| 1,2,3 || 4,5,6 |] i.e. with *one* vertical bar at each end, but when there is only one row you write [|| 1,2,3 ||] i.e. with *two* vertical bars at each end. -- Greg

On Wed, Oct 19, 2016 at 5:32 PM, Todd <toddrjen@gmail.com> wrote:
Well then, you have mixed two proposals here: 1) a literal syntax for nd arrays -- that is not going to fly if there is NO ndarray object builtin to python. I kinda think there should be, though even then there need not be a literal for it (see Decimal). So I'd say -- get an nd array object into the standard library first, then we can talk about the literal 2) what the syntax should be for such a literal. OK, in this case, suggested that the way to hash that out is to start out with passing a string to a function that constructs the array -- then you could try things out without any additions to the language or the libraries -- it could be a stand-alone module that extends numpy: from ndarray_literal import nda my array = nda('||| 3, 4, 5 || 6, 7, 8 |||') (is that legal in your notation -- I honestly am not sure) and yes, that still requires typing "nda('", which you are trying to avoid. But honestly, I really have written a lot of numpy code, and writing: np.array( ..... ) does not bother me at all. IF I did support a literal, it would be so that the object could be constructed immediately rather than by creating other python objects first (liss, usually), and then making an array from that. If you do want to push the syntax idea further, I'd suggest going to the numpy list and seeing what folks there think. But as I can't help myself. It's clear from the other posts on the list here that others find your proposed syntax as confusing as I do. but maybe it could be made more clear. Taking a page from MATLAB: 1 2 3; 4 5 6 is a 2x3 2-d array. no in MATLAB, there only used to be matrixes, so this was pretty nice, but a bit hard to extend to multiple dimensions. But the principle is handy: one delimter for the first dimension,l a nother one for the second, etc.. we probably dont want to go with trying colons, and ! and who knows what else, so I liek your idea. a 2-d array: 1 | 2 | 3 || 4 | 5 | 6 (Or better) 1 | 2 | 3 || 4 | 5 | 6 a 3d array: 0 | 1 | 2 | 3 || 4 | 5 | 6 | 7 || 8 | 9 | 10 | 11 ||| 12 | 13 | 14 | 15|| 16 | 17 | 18 | 19|| 20 | 21 | 22 | 23|| Notes: 1) guess how I wrote that? I did: np.arange(24).reshape((2,3,4)) and edited the results -- making the point that maybe the current state of affairs is not so bad... 2) These are delimiters, rather than brackets -- so don't go at the beginning and optional at the end (like commas in python) 3) It doesn't use commas as all, as having a consistent system is clearer 4) Whitespace is insignificant as with the rest of Python -- though you want to be able to use a line ending as insignificant whitespace, so this may have to wrapped in a parentheses, or something to actually use it -- though a non-issue if it's a string Hmm -- about point (3), maybe use only commas: 0, 1, 2, 3,, 4, 5, 6, 7,, 8, 9, 10, 11,,, 12, 13, 14, 15,, 16, 17, 18, 19,, 20, 21, 22, 23 That would be more consistent with the rest of python, and multiple commas in a row are currently a syntax error. Even if your original data is large, I often need smaller areas when
processing, for example for broadcasting or as arguments to processing functions.
sure I do hard-coded arrays all teh time -- but not big ones, and I don't think I've ever needed more than 2D and certainly not more than 3D. and not large enough that performance matters. It is:
r_[[0, 1, 2], [3, 4, 5]
no, that's a shorthand for "row stack" -- and really not much better than the array() call, except a few less characters I meant the np.matrix() function that Alexander pointed out -- which is only really there to make folks coming from MATLAB happier...(and it makes a Matrix object, which you likely don't want). The point was that it's easy to make such a beast for your new syntax to try it out b = np.array([[ 0, 1, 2 ],
[ 3, 4, 5 ]])
The whole point of this is to avoid the "np.array" call.
again, trying to separate out the idea of a literal, from the syntax of the literal. but thinking now, Python already uses (), [], {}, and < and > -- so I don't think there are any more brackets. but why not just use commas with square brackets: 2Darr = [1, 2, 3,, 4, 5, 6] maybe too subtle? Yes, I understand that. But some projects are already doing that on their
huh? is anyone actually overriding the list constructor?? multiple dims apart (my [ and ,, example shows that you can do that with the current syntax) this is kind of like adding Decimal -- there is another type, but does it need a literal? I have maybe 90% of the code I write with an: import numpy as np at the top -- so yes, I kind a would like a literal, but it's really a pretty small deal -- at least once I got used to it after using MATLAB for years. I'd ask folks that have been using numpy for along time -- would this really help? One more problem -- with the addition of the @ operator, there have not been any use cases in the stdlib, but it is an operator, and Python already has a mechanism for operator overloading. As far as I know, every python literal maps to a SINGLE type -- so creating a literal for a non existent type makes no sense at all. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Oct 19, 2016 at 03:08:21PM -0400, Todd wrote: [taking your later comment out of the order it was written]
If this sort of thing doesn't interest you I won't be offended if you stop reading now, and I apologize if it is considered off-topic for this ML.
No problem Todd, we shouldn't be offended by ideas, and this is definitely on-topic.
Generally speaking, Python doesn't invent syntax just on the off-chance that it will come in handy, nor does it typically invent operators for third-party libraries to use if they have no use in the built-ins. I'm only aware of two exceptions to this, and both were added for numpy: extended slicing seq[start:end:step] and matrix multiplication A @ B. Extended slicing now is used by the built-ins, but originally it was added specifically for numpy. However, in both cases, the suggestion came from the numpy developers themselves, and they had a specific, concrete need for the feature. Both features were solutions to real problems found by numpy users. I wasn't around when extended slicing was added, but matrix multiplication is an excellent example of a well-researched, well-written PEP: http://python.org/dev/peps/pep-0465/ Whereas your suggestion seems more like a solution in search of a problem. You've come up with syntax for building arrays, but you don't seem to know which, if any, array will use this; nor do you seem to have identified an actual problem with the existing solution used by numpy (apart from calling them "somewhat verbose").
Just a brief note on terminology: you're not describing an operator, you're describing a "display" syntax: delimiters used to build a type such as tuple, list or dict. I still think of them as "list literals" etc, [1, 2, 3, 4] for example, even though technically they are not necessary literals (i.e. known at compile-time) and officially they are called "list displays" etc.
Sometimes used for matrices. Its more common to use a multiple-line version of [ ] which is, of course, hard to type in a regular editor :-) See examples of matricies here: http://mathworld.wolfram.com/Matrix.html Moving on to the multi-dimensional examples you give:
To me, that looks decidedly strange. The | symbol has the disadvantage that you cannot tell which is opening a row and which is closing a row. The above looks like: - first row: opened with a single bar, closed with two bars; - second row: no opening delimiter at all, closed with a single bar. I think that you have to compete with existing syntax for nested lists. The lowest common denominator for any array is to use nested lists and a function call. Nested lists can be easily converted into *any* array type you like, rather than picking out one, and only one, array type for special treatment. If Python had a built-in array type, then maybe this would be justified, but it doesn't, and isn't likely to get one: lists fill the role that arrays do in most other languages. There is an array type in the standard library, array.array, but its not built-in and not important enough to be built-in or to get special syntax of its own. And I'm not sure that numpy users miss the ability to write multi-dimensional arrays using syntax instead of a function call. Normally they would want the ability to specify a type and an order (rows first, like C, or columns first, like Fortran), and I think that for multi-dimensional arrays it is more usual and simpler to write out the values in a linear array and tell the array constructor to re-arrange them. Trying to write out a visual representation of anything with more than two dimensions is cumbersome when you are limited to the flat plan of a text file. Consider: [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] If your editor can highlight matching brackets, its quite easy to see where each row and plane begins and ends. Whereas your suggested syntax looks to me like a whole bunch of confusing lines. I cannot even work out what are the dimensions of this example:
although if I sit and stare at it for a while I might guess... 4*3? If I already know it is meant to be 3D, then I might be able to work out that the extra bar means something, and guess 2*3*2, but I really wouldn't want to bet my sanity on understanding what those lines mean. (Especially since, later on, the exact number and placement of lines is optional.) What's the rule for when to use triple bars ||| and when to use double bars || or a single bar | ? It's a mystery to me. At least with matching left and right delimiters [ ] I can match them up to see where they begin and end.
Okay, now I'm completely lost. Doesn't the first example with a single vertical bar | mean that it is a 1D array? What's the "highest-specified dimension"? Are you suggesting that we have to count vertical bars to work out the dimension?
This strikes me as a HUGE bug magnet. More like a bug black hole actually, sucking in bugs from all through the universe and inserting them into your arrays... *wink* Effectively, what you are saying is that *as an intentional feature*, a stray | accidentally inserted into your array will not cause a syntax error, but will instead increase the number of dimensions of the array. So instead of having a 17*10*30 array as you expected, you have a 1*17*10*30 or 17*10*30*1 array, which may or may not fail deep in your code with some more or less unexpected and hard to diagnose error. This (anti-)feature also makes syntax highlighting of matching bars impossible, instead of merely fiendishly difficult. Since it isn't an error for the bars not to match, you can't even count the bars to work out which ones are supposed to match. You have to somehow intuit or guess what the dimensions of the array are supposed to be, then reason backwards to see whether the right number of bars are in the right places to be compatible with those dimensions, and if not, your guess of the dimensions might be wrong... or not.
At least in my opinion, this sort of approach really shines when making higher-dimensional arrays.
You should compare your approach to that of mathematicians and other programming languages. Mathematicians don't really use multi-dimensional arrays. They have vectors, which are 1D, and matrices which are 2D, then they have tensors which confuse me, but they don't seem to use anything which corresponds to a simple higher-dimension analog of matrices. Tensors come close, but they don't seem to have anything like matrix-notation for tensors. (Given that tensors are often infinite dimensional, I'm hardly surprised.) Matlab has syntax for 2D arrays, which can be expanded into 3D: A = [1 2; 3 4]; A(:,:,2) = [5 6; 7 8] R has an array function:
array(1:8, c(2,2,2)) , , 1
[,1] [,2] [1,] 1 3 [2,] 2 4 , , 2 [,1] [,2] [1,] 5 7 [2,] 6 8 Differences in ordering (row first or column first) aside, they are equivalent to Python's: [[[1, 2], [3, 4]], [[5, 6], [7, 8]], ] My HP-48 calculator uses square brackets for matrixes, with the convenience that in the calculator interface I only need to close the first pair of brackets: 2D: I can enter the keystrokes: [[1 2] 3 4 to get the 2D matrix: [[ 1 2 ] [ 3 4 ]] but it has no support for 3D arrays. Here's how C# does it: https://msdn.microsoft.com/en-us/library/2yd9wwz4.aspx
I wouldn't even want to guess what dimensions that is supposed to be. 10 columns, because I can count them, but everything else is a mystery.
But that's easy! Look at the nested brackets. The opening sequence tells you that there are four dimensions: [[[[ I can count the ten columns (and if I align them, I can visually verify that each row has exactly ten columns). Looking at the nested lists, I see: [[[[ten columns], [ten columns]], so that's two rows by ten, then continuing: [2 x 10]], which closes another layer, so that's 2 items in the third dimension, then when have another dimension: [2 x 10 x 2]] and the array is closed, giving us in total: 2 x 10 x 2 x 2 In my opinion anyone trying to write out a single 4D array like this is opening themselves up to a hiding for nothing, even with clear nesting and matching open/close delimiters. Since we don't have 4D text files, it's better to write: L = [48, 11, 141, 13, -60, -37, 58, -52, -29, 134, -6, 96, -66, 137, -59, -147, -118, -104, -123, -7, -103, 50, -89, -12, 28, -12, 119, -131, -73, 21, -58, 105, 25, -138, -106, -118, -29, -49, -63, -56, -43, -34, 101, -115, 41, 121, 3, -117, 101, -145, 100, -128, 76, 128, -113, -90, 52, -91, -72, -15, 22, -65, -118, 134, -58, 55, -73, -118, -53, -60, -85, -136, 83, -66, -35, -117, -71, 115, -56, 133] assert len(L) == 2*10*2*2 arr = array(L, dim=(2,10,2,2)) or something similar, and let the array constructor resize as needed. -- Steve

On 10/19/2016 12:08 PM, Todd wrote:
Optional, semi-meaningless, not-really-an-operator markings? The current approach I could at least figure out if I had to -- yours is confusing. You have done a good job explaining what you mean, but what to you is clear is to me, and others, incomprehensible. -- ~Ethan~
participants (12)
-
Alexander Belopolsky
-
Chris Barker
-
David Mertz
-
Ethan Furman
-
Greg Ewing
-
Joseph Jevnik
-
Matt Gilson
-
Mikhail V
-
Steven D'Aprano
-
Sven R. Kunze
-
Thomas Nyberg
-
Todd