Re: [Numpy-discussion] ENH: Add the function 'expand_view'

First, off sorry for the long turnaround on responding to these questions. Below I have tried to respond to everyone's questions and comments. I have restructured the order of the messages so that my responses are a little more structured. If anybody has more thoughts or questions, please let me know.
Takes an array and tacks on arbitrary dimensions on either side, which is returned as a view always. Here are the relevant features:
* Creates a view of the array that has the dimensions before and after tacked on to it. * Takes the before and after arguments independent of each other and the current shape. * Allows for read and write access to the underlying array.
Can you expand this with some discussion of why you want this function, and why you chose these specific features? (E.g. as mentioned in the PR comments already, the reason broadcast_to returns a read-only array is that it was decided that this was less confusing for users, not because of any technical issue.)
Sometimes I find that broadcasting is insufficient or gets confused with some more complex cases. Even adding the extra dimensions using `np.newaxis`/`None` doesn't always cut it. So, being able to add the dimensions with the right shape and without copying is very nice. However, I wanted to do this in a way where this would always be a view onto the original data to avoid any copying penalty. It also came in handy when I didn't know how many dimensions I would be dealing with. The alternative would be to do something doing things like `a[..., (b.ndim-1)*(None,)]` and possibly something similar with `b`, which end up being pretty gross and unreadable as opposed to doing `expand_view(a, reps_after=b.shape[1:])`, which is quite clear. The latter also conveys to reader what those dimensions are doing. Finally in cases where `None` or `np.newaxis` work fine, it is possible for you to combine them in some operation you are doing in way which you did not intend. If broadcasting would have worked in this case, you won't get an error, but you will get the wrong answer, which you may discover later than you would like. When using `expand_view` an explicit shape can be set, which when mismatched will give you an error allowing you discover the problem as soon as you run the code as opposed after when you are trying to figure out your answer is wrong. The write option I never actually use so making it readonly is fine. I just never blocked that behavior. Perhaps it would be preferable as has been suggested. Ensuring a view is nice because this remains performant for large arrays. Using a view seems reasonable here based on the fact that we are never trying to restructure the data here. Instead, we are trying to make it appear as if its shape were different.
How is this different from using np.newaxis and broadcasting? Or am I misunderstanding this?
I hope the description above also helps clarify this a bit. To be more succinct and explicit: * This can always be used even when broadcasting may get confused. * Works cleanly and easily with variable numbers of dimensions unlike `np.newaxis`/`None`. * Using explicit shapes more tightly constrains behavior with the resulting view (helping catch bugs). * Provides the reader more information about how the other dimensions should work.
Why is this a stride_trick?
I thought this looks similar to expand_dims and could maybe be implemented with some extra options there.
From a performance standpoint, `expand_dims` using `reshape` to add
So, `expand_dims` adds a single dimension with a length of 1 where specified, but not multiple. Though I am going to infer by extra options you mean change it some way that it can add multiple axes. Though I suppose you could do this and it may even be worth pursing, it ends up being a similar idea to using `np.newaxis`/`None` a few times and so I have basically the same points to make here as I made in the last section's response. Also, I feel like using `expand_dims` with more axes specified might make it a little harder to follow than just using `np.newaxis`/`None`, but it could have its merrits. these extra dimensions. So, it has the possibility of not returning a view, but a copy, which could take some time to build if the array is large. By using our method, we guarantee a view will always be returned; so, this allocation will never be encountered.

On Do, 2016-01-07 at 22:48 -0500, John Kirkham wrote:
First, off sorry for the long turnaround on responding to these questions. Below I have tried to respond to everyone's questions and comments. I have restructured the order of the messages so that my responses are a little more structured. If anybody has more thoughts or questions, please let me know.
<snip>
From a performance standpoint, `expand_dims` using `reshape` to add these extra dimensions. So, it has the possibility of not returning a view, but a copy, which could take some time to build if the array is large. By using our method, we guarantee a view will always be returned; so, this allocation will never be encountered.
Actually reshape can be used, if stride tricks can do the trick, then reshape is guaranteed to not do a copy. Still can't say I feel it is a worthy addition (but others may disagree), especially since I realized we have `expand_dims` already. I just am not sure it would actually be used reasonably often. But how about adding a `dims=1` argument to `expand_dims`, so that your function becomes at least a bit easier: newarr = np.expand_dims(arr, -1, after.ndim) # If manual broadcast is necessary: newarr = np.broadcast_to(arr, before.shape + arr.shape + after.shape) Otherwise I guess we could morph `expand_dims` partially to your function, though it would return a read-only view if broadcasting is active, so I am a bit unsure I like it. In other words, my personal currently preferred solution to get some of this would be to add a simple `dims` argument to `expand_dims` and add `broadcast_to` to its "See Also" section (also it could be in the `squeeze` see also section I think). To further help out other people, we could maybe even mention the combination in the examples (i.e. broadcast_to example?). - Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
John Kirkham
-
Sebastian Berg