HI all, There has been some recent discussion going on on the limitations that numpy imposes to taking views of an array with a different dtype. As of right now, you can basically only take a view of an array if it has no Python objects and neither the old nor the new dtype are structured. Furthermore, the array has to be either C or Fortran contiguous. This seem to be way too strict, but the potential for disaster getting a loosening of the restrictions wrong is big, so it should be handled with care. Allan Haldane and myself have been looking into this separately and discussing some of the details over at github, and we both think that the only true limitation that has to be imposed is that the offsets of Python objects within the new and old dtypes remain compatible. I have expanded Allan's work from here: https://github.com/ahaldane/numpy/commit/e9ca367 to make it as flexible as I have been able. An implementation of the algorithm in Python, with a few tests, can be found here: https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py I would appreciate getting some eyes on it for correctness, and to make sure that it won't break with some weird dtype. I am also trying to figure out what the ground rules for stride and shape conversions when taking a view with a different dtype should be. I submitted a PR (gh-5508) a couple for days ago working on that, but I am not so sure that the logic is completely sound. Again, to get more eyes on it, I am going to reproduce my thoughts here on the hope of getting some feedback. The objective would be to always allow a view of a different dtype (given that the dtypes be compatible as described above) to be taken if: - The itemsize of the dtype doesn't change. - The itemsize changes, but the array being viewed is the result of slicing and transposing axes of a contiguous array, and it is still contiguous, defined as stride == dtype.itemsize, along its smallest-strided dimension, and the itemsize of the newtype exactly divides the size of that dimension. - Ideally taking a view should be a reversible process, i.e. if oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you back a view of arr with the same original shape, strides and dtype. This last point can get tricky if the minimal stride dimension has size 1, as there could be several of those, e.g.:
a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) a.flags.contiguous False a.shape (3, 1, 2) a.strides # the stride of the size 1 dimension could be anything, ignore it! (32, 8, 8)
b = a.view(complex) # this fails right now, but should work
b.flags.contiguous False b.shape (3, 1, 1) b.strides # the stride of the size 1 dimensions could be anything, ignore them! (32, 16, 16)
c = b.view(float) # which of the two size 1 dimensions should we expand? "In the face of ambiguity refuse the temptation to guess" dictates that last view should raise an error, unless we agree and document some default. Any thoughts? Then there is the endless complication one could get into with arrays created with as_strided. I'm not smart enough to figure when and when not those could work, but am willing to retake the discussion if someone wiser si interested. With all these in mind, my proposal for the new behavior is that taking a view of an array with a different dtype would require: 1. That the newtype and oldtype be compatible, as defined by the algorithm checking object offsets linked above. 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, make it happen! 3. If the array is C/Fortran contiguous, check that the size in bytes of the last/first dimension is evenly divided by newtype.itemsize. If it does, go for it. 4. For non-contiguous arrays: 1. Ignoring dimensions of size 1, check that no stride is smaller than either oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided product, sorry, can't do it! 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == oldtype.itemsize 1. If found, check that it is the only one with that stride, that it is the minimal stride, and that the size in bytes of that dimension is evenly divided by newitem,itemsize. 2. If none is found, check if there is a size 1 dimension that is also unique (unless we agree on a default, as mentioned above) and that newtype.itemsize evenly divides oldtype.itemsize. Apologies for the long, dense content, but any thought or comments are very welcome. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
Hi Jamie, I'm not sure whether to reply here or on github! I have a comment on the condition "There must be the same total number of objects before and after". I originally looked into this to solve https://github.com/numpy/numpy/issues/3256, which involves taking a view of a subset of the fields of a structured array. Such a view causes the final number objects to be less than the original number. For example, say you have an array with three fields a,b,c, and you only want a and c. Numpy currently does the equivalent of this (in _index_fields in numpy/core/_internals.py): >>> a = zeros([(1,2,3), (4,5,6)], ... dtype([('a', 'i8'), ('b', 'i8'), ('c', 'i8')])) >>> a.view(dtype({'names': ['x', 'y'], ... 'formats': ['i8', 'i8'], ... 'offsets': [0, 16]})) array([(1, 3), (4, 6)], dtype={'names':['x','y'], 'formats':['<i8','<i8'], 'offsets':[0,16], 'itemsize':24}) It works although the repr is a bit wordy. However, it currently does not work if we have 'O' instead of 'i8' since we can't take views of object arrays. If this was an object array, the resulting view only has two objects, not the original three. This does mean that the view can't be "undone" since the view object no longer "knows" there is an object at offset 8. But that does not seem bad to me compared to the usefulness of such a view. I think to fix this, you only need to remove the test for this case in dtype_view_is_safe, and then return all(o in old_offsets for o in new_offsets). (and btw, I like the much cleaner logic you have in that section!) Allan On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote:
HI all,
There has been some recent discussion going on on the limitations that numpy imposes to taking views of an array with a different dtype.
As of right now, you can basically only take a view of an array if it has no Python objects and neither the old nor the new dtype are structured. Furthermore, the array has to be either C or Fortran contiguous.
This seem to be way too strict, but the potential for disaster getting a loosening of the restrictions wrong is big, so it should be handled with care.
Allan Haldane and myself have been looking into this separately and discussing some of the details over at github, and we both think that the only true limitation that has to be imposed is that the offsets of Python objects within the new and old dtypes remain compatible. I have expanded Allan's work from here:
https://github.com/ahaldane/numpy/commit/e9ca367
to make it as flexible as I have been able. An implementation of the algorithm in Python, with a few tests, can be found here:
https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py
I would appreciate getting some eyes on it for correctness, and to make sure that it won't break with some weird dtype.
I am also trying to figure out what the ground rules for stride and shape conversions when taking a view with a different dtype should be. I submitted a PR (gh-5508) a couple for days ago working on that, but I am not so sure that the logic is completely sound. Again, to get more eyes on it, I am going to reproduce my thoughts here on the hope of getting some feedback.
The objective would be to always allow a view of a different dtype (given that the dtypes be compatible as described above) to be taken if:
* The itemsize of the dtype doesn't change. * The itemsize changes, but the array being viewed is the result of slicing and transposing axes of a contiguous array, and it is still contiguous, defined as stride == dtype.itemsize, along its smallest-strided dimension, and the itemsize of the newtype exactly divides the size of that dimension. * Ideally taking a view should be a reversible process, i.e. if oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you back a view of arr with the same original shape, strides and dtype.
This last point can get tricky if the minimal stride dimension has size 1, as there could be several of those, e.g.:
>>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) >>> a.flags.contiguous False >>> a.shape (3, 1, 2) >>> a.strides # the stride of the size 1 dimension could be anything, ignore it! (32, 8, 8)
b = a.view(complex) # this fails right now, but should work >>> b.flags.contiguous False >>> b.shape (3, 1, 1) >>> b.strides # the stride of the size 1 dimensions could be anything, ignore them! (32, 16, 16)
c = b.view(float) # which of the two size 1 dimensions should we expand?
"In the face of ambiguity refuse the temptation to guess" dictates that last view should raise an error, unless we agree and document some default. Any thoughts?
Then there is the endless complication one could get into with arrays created with as_strided. I'm not smart enough to figure when and when not those could work, but am willing to retake the discussion if someone wiser si interested.
With all these in mind, my proposal for the new behavior is that taking a view of an array with a different dtype would require:
1. That the newtype and oldtype be compatible, as defined by the algorithm checking object offsets linked above. 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, make it happen! 3. If the array is C/Fortran contiguous, check that the size in bytes of the last/first dimension is evenly divided by newtype.itemsize. If it does, go for it. 4. For non-contiguous arrays: 1. Ignoring dimensions of size 1, check that no stride is smaller than either oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided product, sorry, can't do it! 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == oldtype.itemsize 1. If found, check that it is the only one with that stride, that it is the minimal stride, and that the size in bytes of that dimension is evenly divided by newitem,itemsize. 2. If none is found, check if there is a size 1 dimension that is also unique (unless we agree on a default, as mentioned above) and that newtype.itemsize evenly divides oldtype.itemsize.
Apologies for the long, dense content, but any thought or comments are very welcome.
Jaime
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río <jaime.frio@gmail.com> wrote: [...]
With all these in mind, my proposal for the new behavior is that taking a view of an array with a different dtype would require:
That the newtype and oldtype be compatible, as defined by the algorithm checking object offsets linked above. If newtype.itemsize == oldtype.itemsize no more checks are needed, make it happen! If the array is C/Fortran contiguous, check that the size in bytes of the last/first dimension is evenly divided by newtype.itemsize. If it does, go for it. For non-contiguous arrays:
Ignoring dimensions of size 1, check that no stride is smaller than either oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided product, sorry, can't do it! Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == oldtype.itemsize
If found, check that it is the only one with that stride, that it is the minimal stride, and that the size in bytes of that dimension is evenly divided by newitem,itemsize. If none is found, check if there is a size 1 dimension that is also unique (unless we agree on a default, as mentioned above) and that newtype.itemsize evenly divides oldtype.itemsize.
I'm really wary of this idea that we go grovelling around looking for some suitable dimension somewhere to absorb the new items. Basically nothing in numpy produces semantically different arrays (e.g., ones with different shapes) depending on the *strides* of the input array. Could we make it more like: check to see if the last dimension works. If not, raise an error (and let the user transpose some other dimension there if that's what they wanted)? Or require the user to specify which dimension will absorb the shape change? (If we were doing this from scratch, then it would be tempting to just say that we always add a new dimension at the end with newtype.itemsize / oldtype.itemsize entries, or absorb such a dimension if shrinking. As a bonus, this would always work, regardless of contiguity! Except that when shrinking the last dimension would have to be contiguous, of course.) I guess the main consideration for this is that we may be stuck with stuff b/c of backwards compatibility. Can you maybe say a little bit about what is allowed now, and what constraints that puts on things? E.g. are we already grovelling around in strides and picking random dimensions in some cases? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río <jaime.frio@gmail.com> wrote: [...]
With all these in mind, my proposal for the new behavior is that taking a view of an array with a different dtype would require:
That the newtype and oldtype be compatible, as defined by the algorithm checking object offsets linked above. If newtype.itemsize == oldtype.itemsize no more checks are needed, make it happen! If the array is C/Fortran contiguous, check that the size in bytes of the last/first dimension is evenly divided by newtype.itemsize. If it does, go for it. For non-contiguous arrays:
Ignoring dimensions of size 1, check that no stride is smaller than either oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided product, sorry, can't do it! Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == oldtype.itemsize
If found, check that it is the only one with that stride, that it is the minimal stride, and that the size in bytes of that dimension is evenly divided by newitem,itemsize. If none is found, check if there is a size 1 dimension that is also unique (unless we agree on a default, as mentioned above) and that newtype.itemsize evenly divides oldtype.itemsize.
I'm really wary of this idea that we go grovelling around looking for some suitable dimension somewhere to absorb the new items. Basically nothing in numpy produces semantically different arrays (e.g., ones with different shapes) depending on the *strides* of the input array.
In a convoluted way, changing the dtype already does different thing depending on the strides, as right now the expansion/contraction happens along the last/first axis depending if the array is C/Fortran contiguous, and those flags are calculated from the strides:
a = np.ones((2, 2), dtype=complex) a.view(float).shape (2, 4) a.T.view(float).shape (4, 2)
A user unaware that transposition has changed the memory layout will be surprised to find his complex values being unpacked along the first, not the last dimension. But that's the way it already is. With my proposal above, the intention is that the same happens not only for the standard "reverse axes order" transposition, but with any other one, even if you have sliced the array.
Could we make it more like: check to see if the last dimension works. If not, raise an error (and let the user transpose some other dimension there if that's what they wanted)? Or require the user to specify which dimension will absorb the shape change? (If we were doing this from scratch, then it would be tempting to just say that we always add a new dimension at the end with newtype.itemsize / oldtype.itemsize entries, or absorb such a dimension if shrinking. As a bonus, this would always work, regardless of contiguity! Except that when shrinking the last dimension would have to be contiguous, of course.)
When we roll @ in and people start working with stacks of matrices, we will probably find ourselves having to create an alias, similar to .T, for .swapaxes(-1, -2). Searching for the smallest stride allows to take views of such arrays, which does not work right now because the array is no longer contiguous globally.
I guess the main consideration for this is that we may be stuck with stuff b/c of backwards compatibility. Can you maybe say a little bit about what is allowed now, and what constraints that puts on things? E.g. are we already grovelling around in strides and picking random dimensions in some cases?
Just to restate it: right now we only allow new views if the array is globally contiguous, so either along the first or last dimension. Jaime
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río wrote:
On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith <njs@pobox.com> wrote: On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río <jaime.frio@gmail.com> wrote: [...]
<snip>
Could we make it more like: check to see if the last dimension works. If not, raise an error (and let the user transpose some other dimension there if that's what they wanted)? Or require the user to specify which dimension will absorb the shape change? (If we were doing this from scratch, then it would be tempting to just say that we always add a new dimension at the end with newtype.itemsize / oldtype.itemsize entries, or absorb such a dimension if shrinking. As a bonus, this would always work, regardless of contiguity! Except that when shrinking the last dimension would have to be contiguous, of course.)
When we roll @ in and people start working with stacks of matrices, we will probably find ourselves having to create an alias, similar to .T, for .swapaxes(-1, -2). Searching for the smallest stride allows to take views of such arrays, which does not work right now because the array is no longer contiguous globally.
That is true, but I agree with Nathaniel at least as far as that I would prefer a user to be able to safely use `view` even he has not even an inkling about what his memory layout is. One option would be an `axis=-1` default (maybe FutureWarn this from `axis=None` which would look at order, see below -- or maybe have axis='A', 'C' and 'F' and default to 'A' for starters). This even now could start creating bugs when enabling relaxed strides :(, because your good old fortran order complex array being viewed as a float one could expand along the wrong axis, and even without such arrays swap order pretty fast when operating on them, which can create impossibly to find bugs, because even a poweruser is likely to forget about such things. Of course you could argue that view is a poweruser feature and a user using it should keep these things in mind.... Though if you argue that, you can almost just use `np.ndarray` directly ;) -- ok, not really considering how cumbersome it is, but still. - Sebastian
I guess the main consideration for this is that we may be stuck with stuff b/c of backwards compatibility. Can you maybe say a little bit about what is allowed now, and what constraints that puts on things? E.g. are we already grovelling around in strides and picking random dimensions in some cases?
Just to restate it: right now we only allow new views if the array is globally contiguous, so either along the first or last dimension.
Jaime
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río wrote:
On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith <njs@pobox.com> wrote: On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río <jaime.frio@gmail.com> wrote: [...]
<snip>
Could we make it more like: check to see if the last dimension works. If not, raise an error (and let the user transpose some other dimension there if that's what they wanted)? Or require the user to specify which dimension will absorb the shape change? (If we were doing this from scratch, then it would be tempting to just say that we always add a new dimension at the end with newtype.itemsize / oldtype.itemsize entries, or absorb such a dimension if shrinking. As a bonus, this would always work, regardless of contiguity! Except that when shrinking the last dimension would have to be contiguous, of course.)
When we roll @ in and people start working with stacks of matrices, we will probably find ourselves having to create an alias, similar to .T, for .swapaxes(-1, -2). Searching for the smallest stride allows to take views of such arrays, which does not work right now because the array is no longer contiguous globally.
That is true, but I agree with Nathaniel at least as far as that I would prefer a user to be able to safely use `view` even he has not even an inkling about what his memory layout is. One option would be an `axis=-1` default (maybe FutureWarn this from `axis=None` which would look at order, see below -- or maybe have axis='A', 'C' and 'F' and default to 'A' for starters).
This even now could start creating bugs when enabling relaxed strides :(, because your good old fortran order complex array being viewed as a float one could expand along the wrong axis, and even without such arrays swap order pretty fast when operating on them, which can create impossibly to find bugs, because even a poweruser is likely to forget about such things.
Of course you could argue that view is a poweruser feature and a user using it should keep these things in mind.... Though if you argue that, you can almost just use `np.ndarray` directly ;) -- ok, not really considering how cumbersome it is, but still.
I have been giving this some thought, and am willing to concede that my first proposal may have been too ambitious. So even though the knob goes to 11, we can always do things incrementally. I am also wary of adding new keywords when it seems obvious that we do not have the functionality completely figured out, so here's my new proposal: - The objective is that a view of an array that is the result of slicing a contiguous array should be possible, if it remains "contiguous" (meaning stride == itemsize) along its original contiguous (first or last) dimension. This eliminates axis transposition from the previous proposal, although reversing the axes completely would also work. - To verify this, unless the C contiguous or Fortran contiguous flags are set, we would still need to look at the strides. An array would be C contiguous if, starting from the last stride it is equal to the itemsize, and working backwards every next stride is larger or equal than the product of the previous stride by the previous dimension. dimensions of size 1 would be ignored for these, except for the last one, which would be taken to have stride = itemsize. The Fortran case is of course the same in reverse. - I think the above combined with the current preference of C contiguousness over Fortran, would actually allow the views to always be reversible, which is also a nice thing to have. This eliminates most of the weirdness, but extends current functionality to cover cases like Jens reported a few days back. Does this sound better? Jaime
- Sebastian
I guess the main consideration for this is that we may be stuck with stuff b/c of backwards compatibility. Can you maybe say a little bit about what is allowed now, and what constraints that puts on things? E.g. are we already grovelling around in strides and picking random dimensions in some cases?
Just to restate it: right now we only allow new views if the array is globally contiguous, so either along the first or last dimension.
Jaime
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Mo, 2015-02-02 at 06:25 -0800, Jaime Fernández del Río wrote:
On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote: On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río wrote: > On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith <njs@pobox.com> > wrote: > On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río > <jaime.frio@gmail.com> wrote: > [...]
<snip>
> > Could we make it more like: check to see if the last dimension > works. > If not, raise an error (and let the user transpose some other > dimension there if that's what they wanted)? Or require the > user to > specify which dimension will absorb the shape change? (If we > were > doing this from scratch, then it would be tempting to just say > that we > always add a new dimension at the end with newtype.itemsize / > oldtype.itemsize entries, or absorb such a dimension if > shrinking. As > a bonus, this would always work, regardless of contiguity! > Except that > when shrinking the last dimension would have to be contiguous, > of > course.) > > > When we roll @ in and people start working with stacks of matrices, we > will probably find ourselves having to create an alias, similar to .T, > for .swapaxes(-1, -2). Searching for the smallest stride allows to > take views of such arrays, which does not work right now because the > array is no longer contiguous globally. >
That is true, but I agree with Nathaniel at least as far as that I would prefer a user to be able to safely use `view` even he has not even an inkling about what his memory layout is. One option would be an `axis=-1` default (maybe FutureWarn this from `axis=None` which would look at order, see below -- or maybe have axis='A', 'C' and 'F' and default to 'A' for starters).
This even now could start creating bugs when enabling relaxed strides :(, because your good old fortran order complex array being viewed as a float one could expand along the wrong axis, and even without such arrays swap order pretty fast when operating on them, which can create impossibly to find bugs, because even a poweruser is likely to forget about such things.
Of course you could argue that view is a poweruser feature and a user using it should keep these things in mind.... Though if you argue that, you can almost just use `np.ndarray` directly ;) -- ok, not really considering how cumbersome it is, but still.
I have been giving this some thought, and am willing to concede that my first proposal may have been too ambitious. So even though the knob goes to 11, we can always do things incrementally. I am also wary of adding new keywords when it seems obvious that we do not have the functionality completely figured out, so here's my new proposal:
* The objective is that a view of an array that is the result of slicing a contiguous array should be possible, if it remains "contiguous" (meaning stride == itemsize) along its original contiguous (first or last) dimension. This eliminates axis transposition from the previous proposal, although reversing the axes completely would also work. * To verify this, unless the C contiguous or Fortran contiguous flags are set, we would still need to look at the strides. An array would be C contiguous if, starting from the last stride it is equal to the itemsize, and working backwards every next stride is larger or equal than the product of the previous stride by the previous dimension. dimensions of size 1 would be ignored for these, except for the last one, which would be taken to have stride = itemsize. The Fortran case is of course the same in reverse. * I think the above combined with the current preference of C contiguousness over Fortran, would actually allow the views to always be reversible, which is also a nice thing to have. This eliminates most of the weirdness, but extends current functionality to cover cases like Jens reported a few days back.
Does this sound better?
It seems fine as such, but I still worry about relaxed strides, though this is not really directly related to your efforts here. The problem I see is something like this (any numpy version): arr = np.array([[1, 2]], dtype=np.float64, order='C').T # note that arr is fortran contiguous view = arr.view(np.complex128) not_arr = view.view(np.float64) np.array_equal(arr, not_arr) # False! And with relaxed strides, the situation should become worse, because "Fortran order unless C order" logic is harder to predict, and here does an actual difference even for non (1, 1) arrays. Which creates the possibility of breaking currently working code. - Sebastian
Jaime
- Sebastian
> > I guess the main consideration for this is that we may be > stuck with > stuff b/c of backwards compatibility. Can you maybe say a > little bit > about what is allowed now, and what constraints that puts on > things? > E.g. are we already grovelling around in strides and picking > random > dimensions in some cases? > > > Just to restate it: right now we only allow new views if the array is > globally contiguous, so either along the first or last dimension. > > > Jaime > > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of > Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Feb 3, 2015 at 1:28 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mo, 2015-02-02 at 06:25 -0800, Jaime Fernández del Río wrote:
On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote: On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río wrote: > On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith <njs@pobox.com> > wrote: > On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río > <jaime.frio@gmail.com> wrote: > [...]
<snip>
> > Could we make it more like: check to see if the last dimension > works. > If not, raise an error (and let the user transpose some other > dimension there if that's what they wanted)? Or require the > user to > specify which dimension will absorb the shape change? (If we > were > doing this from scratch, then it would be tempting to just say > that we > always add a new dimension at the end with newtype.itemsize / > oldtype.itemsize entries, or absorb such a dimension if > shrinking. As > a bonus, this would always work, regardless of contiguity! > Except that > when shrinking the last dimension would have to be contiguous, > of > course.) > > > When we roll @ in and people start working with stacks of matrices, we > will probably find ourselves having to create an alias, similar to .T, > for .swapaxes(-1, -2). Searching for the smallest stride allows to > take views of such arrays, which does not work right now because the > array is no longer contiguous globally. >
That is true, but I agree with Nathaniel at least as far as that I would prefer a user to be able to safely use `view` even he has not even an inkling about what his memory layout is. One option would be an `axis=-1` default (maybe FutureWarn this from `axis=None` which would look at order, see below -- or maybe have axis='A', 'C' and 'F' and default to 'A' for starters).
This even now could start creating bugs when enabling relaxed strides :(, because your good old fortran order complex array being viewed as a float one could expand along the wrong axis, and even without such arrays swap order pretty fast when operating on them, which can create impossibly to find bugs, because even a poweruser is likely to forget about such things.
Of course you could argue that view is a poweruser feature and a user using it should keep these things in mind.... Though if you argue that, you can almost just use `np.ndarray` directly ;) -- ok, not really considering how cumbersome it is, but still.
I have been giving this some thought, and am willing to concede that my first proposal may have been too ambitious. So even though the knob goes to 11, we can always do things incrementally. I am also wary of adding new keywords when it seems obvious that we do not have the functionality completely figured out, so here's my new proposal:
* The objective is that a view of an array that is the result of slicing a contiguous array should be possible, if it remains "contiguous" (meaning stride == itemsize) along its original contiguous (first or last) dimension. This eliminates axis transposition from the previous proposal, although reversing the axes completely would also work. * To verify this, unless the C contiguous or Fortran contiguous flags are set, we would still need to look at the strides. An array would be C contiguous if, starting from the last stride it is equal to the itemsize, and working backwards every next stride is larger or equal than the product of the previous stride by the previous dimension. dimensions of size 1 would be ignored for these, except for the last one, which would be taken to have stride = itemsize. The Fortran case is of course the same in reverse. * I think the above combined with the current preference of C contiguousness over Fortran, would actually allow the views to always be reversible, which is also a nice thing to have. This eliminates most of the weirdness, but extends current functionality to cover cases like Jens reported a few days back.
Does this sound better?
It seems fine as such, but I still worry about relaxed strides, though this is not really directly related to your efforts here. The problem I see is something like this (any numpy version):
arr = np.array([[1, 2]], dtype=np.float64, order='C').T # note that arr is fortran contiguous view = arr.view(np.complex128) not_arr = view.view(np.float64) np.array_equal(arr, not_arr) # False!
Yes, dimensions of size one can be a pain...
And with relaxed strides, the situation should become worse, because "Fortran order unless C order" logic is harder to predict, and here does an actual difference even for non (1, 1) arrays. Which creates the possibility of breaking currently working code.
Do you have a concrete example of what a non (1, 1) array that fails with relaxed strides would look like? If we used, as right now, the array flags as a first choice point, and only if none is set try to determine it from the strides/dimensions information, I fail to imagine any situation where the end result would be worse than now. I don't think that a little bit of predictable surprising in an advanced functionality is too bad. We could start raising "on the face of ambiguity, we refuse to guess" errors, even for the current behavior you show above, but that is more likely to trip people by not giving them any simple workaround, that it seems to me would be "add a .T if all dimensions are 1" in some particular situations. Or are you thinking of something more serious than a shape mismatch when you write about "breaking current code"? If there are any real loopholes in expanding this functionality, then lets not do it, but we know we have at least one user unsatisfied with the current performance, so I really think it is worth trying. Plus, I'll admit to that, messing around with some of these stuff deep inside the guts of the beast is lots of fun! ;) Jaime
- Sebastian
Jaime
- Sebastian
> > I guess the main consideration for this is that we may be > stuck with > stuff b/c of backwards compatibility. Can you maybe say a > little bit > about what is allowed now, and what constraints that puts on > things? > E.g. are we already grovelling around in strides and picking > random > dimensions in some cases? > > > Just to restate it: right now we only allow new views if the array is > globally contiguous, so either along the first or last dimension. > > > Jaime > > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of > Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
<snip>
Do you have a concrete example of what a non (1, 1) array that fails with relaxed strides would look like?
If we used, as right now, the array flags as a first choice point, and only if none is set try to determine it from the strides/dimensions information, I fail to imagine any situation where the end result would be worse than now. I don't think that a little bit of predictable surprising in an advanced functionality is too bad. We could start raising "on the face of ambiguity, we refuse to guess" errors, even for the current behavior you show above, but that is more likely to trip people by not giving them any simple workaround, that it seems to me would be "add a .T if all dimensions are 1" in some particular situations. Or are you thinking of something more serious than a shape mismatch when you write about "breaking current code"?
Yes, I am talking only about wrongly shaped results for some fortran order arrays. A (20, 1) fortran order complex array being viewed as float, will with relaxed strides become a (20, 2) array instead of a (40, 1) one.
If there are any real loopholes in expanding this functionality, then lets not do it, but we know we have at least one user unsatisfied with the current performance, so I really think it is worth trying. Plus, I'll admit to that, messing around with some of these stuff deep inside the guts of the beast is lots of fun! ;)
I do not think there are loopholes with expanding this functionality. I think there have regressions when we put relaxed strides to on, because suddenly the fortran order array might be expanded along a 1-sized axis, because it is also C order. So I wonder if we can fix these regressions and at the same time maybe provide a more intuitive approach then using the memory order blindly.... - Sebastian
Jaime
- Sebastian
> > Jaime > > > > - Sebastian > > > > > I guess the main consideration for this is that we > may be > > stuck with > > stuff b/c of backwards compatibility. Can you maybe > say a > > little bit > > about what is allowed now, and what constraints that > puts on > > things? > > E.g. are we already grovelling around in strides and > picking > > random > > dimensions in some cases? > > > > > > Just to restate it: right now we only allow new views if the > array is > > globally contiguous, so either along the first or last > dimension. > > > > > > Jaime > > > > > > -n > > > > -- > > Nathaniel J. Smith > > Postdoctoral researcher - Informatics - University > of > > Edinburgh > > http://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > -- > > (\__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale > en sus > > planes de dominación mundial. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
<snip>
Do you have a concrete example of what a non (1, 1) array that fails with relaxed strides would look like?
If we used, as right now, the array flags as a first choice point, and only if none is set try to determine it from the strides/dimensions information, I fail to imagine any situation where the end result would be worse than now. I don't think that a little bit of predictable surprising in an advanced functionality is too bad. We could start raising "on the face of ambiguity, we refuse to guess" errors, even for the current behavior you show above, but that is more likely to trip people by not giving them any simple workaround, that it seems to me would be "add a .T if all dimensions are 1" in some particular situations. Or are you thinking of something more serious than a shape mismatch when you write about "breaking current code"?
Yes, I am talking only about wrongly shaped results for some fortran order arrays. A (20, 1) fortran order complex array being viewed as float, will with relaxed strides become a (20, 2) array instead of a (40, 1) one.
That is a limitation of the current implementation too, and happens already whenever relaxed strides are in place. Which is the default for 1.10, right? Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after all? It should probably be more of a hint of what to do (fortran vs c) when in doubt. "C" would prioritize last axis, "F" the first, and we could even add a "raise" option to have it fail if the axis cannot be inferred from the strides and shape. Current behavior would is equivalent to what "C" would do. Jaime
If there are any real loopholes in expanding this functionality, then lets not do it, but we know we have at least one user unsatisfied with the current performance, so I really think it is worth trying. Plus, I'll admit to that, messing around with some of these stuff deep inside the guts of the beast is lots of fun! ;)
I do not think there are loopholes with expanding this functionality. I think there have regressions when we put relaxed strides to on, because suddenly the fortran order array might be expanded along a 1-sized axis, because it is also C order. So I wonder if we can fix these regressions and at the same time maybe provide a more intuitive approach then using the memory order blindly....
- Sebastian
Jaime
- Sebastian
> > Jaime > > > > - Sebastian > > > > > I guess the main consideration for this is that we > may be > > stuck with > > stuff b/c of backwards compatibility. Can you maybe > say a > > little bit > > about what is allowed now, and what constraints that > puts on > > things? > > E.g. are we already grovelling around in strides and > picking > > random > > dimensions in some cases? > > > > > > Just to restate it: right now we only allow new views if the > array is > > globally contiguous, so either along the first or last > dimension. > > > > > > Jaime > > > > > > -n > > > > -- > > Nathaniel J. Smith > > Postdoctoral researcher - Informatics - University > of > > Edinburgh > > http://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > -- > > (\__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale > en sus > > planes de dominación mundial. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Tue Feb 03 2015 at 1:47:34 PM Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg <sebastian@sipsolutions.net
wrote:
On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
<snip>
Do you have a concrete example of what a non (1, 1) array that fails with relaxed strides would look like?
If we used, as right now, the array flags as a first choice point, and only if none is set try to determine it from the strides/dimensions information, I fail to imagine any situation where the end result would be worse than now. I don't think that a little bit of predictable surprising in an advanced functionality is too bad. We could start raising "on the face of ambiguity, we refuse to guess" errors, even for the current behavior you show above, but that is more likely to trip people by not giving them any simple workaround, that it seems to me would be "add a .T if all dimensions are 1" in some particular situations. Or are you thinking of something more serious than a shape mismatch when you write about "breaking current code"?
Yes, I am talking only about wrongly shaped results for some fortran order arrays. A (20, 1) fortran order complex array being viewed as float, will with relaxed strides become a (20, 2) array instead of a (40, 1) one.
That is a limitation of the current implementation too, and happens already whenever relaxed strides are in place. Which is the default for 1.10, right?
Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after all? It should probably be more of a hint of what to do (fortran vs c) when in doubt. "C" would prioritize last axis, "F" the first, and we could even add a "raise" option to have it fail if the axis cannot be inferred from the strides and shape. Current behavior would is equivalent to what "C" would do.
Jaime
IMHO, the best option would be something like this: - When changing to a type with smaller itemsize, add a new axis after the others so the resulting array is C contiguous (unless a different axis is specified by a keyword argument). The advantage here is that if you index the new view using the old indices for an entry, you get an array showing its representation in the new type. - No shape change for views with the same itemsize - When changing to a type with a larger itemsize, collapse along the last axis unless a different axis is specified, throwing an error if the axis specified does not match the axis specified. The last point essentially is just adding an axis argument. I like that idea because it gives users the ability to do all that the array's memory layout allows in a clear and concise way. Throwing an error if the default axis doesn't work would be a good way to prevent strange bugs from happening when the default behavior is expected. The first point would be a break in backwards compatibility, so I'm not sure if it's feasible at this point. The advantage would be that all all arrays returned when using this functionality would be contiguous along the last axis. The shape of the new array would be independent of the memory layout of the original one. This would also be a much cleaner way to ensure that views of a different type are always reversible while still allowing for relaxed strides. Either way, thanks for looking into this. It's a great feature to have available. -Ian Henriksen
If there are any real loopholes in expanding this functionality, then lets not do it, but we know we have at least one user unsatisfied with the current performance, so I really think it is worth trying. Plus, I'll admit to that, messing around with some of these stuff deep inside the guts of the beast is lots of fun! ;)
I do not think there are loopholes with expanding this functionality. I think there have regressions when we put relaxed strides to on, because suddenly the fortran order array might be expanded along a 1-sized axis, because it is also C order. So I wonder if we can fix these regressions and at the same time maybe provide a more intuitive approach then using the memory order blindly....
- Sebastian
Jaime
- Sebastian
> > Jaime > > > > - Sebastian > > > > > I guess the main consideration for this is that we > may be > > stuck with > > stuff b/c of backwards compatibility. Can you maybe > say a > > little bit > > about what is allowed now, and what constraints that > puts on > > things? > > E.g. are we already grovelling around in strides and > picking > > random > > dimensions in some cases? > > > > > > Just to restate it: right now we only allow new views if the > array is > > globally contiguous, so either along the first or last > dimension. > > > > > > Jaime > > > > > > -n > > > > -- > > Nathaniel J. Smith > > Postdoctoral researcher - Informatics - University > of > > Edinburgh > > http://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > -- > > (\__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale > en sus > > planes de dominación mundial. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Feb 3, 2015 at 1:52 PM, Ian Henriksen < insertinterestingnamehere@gmail.com> wrote:
On Tue Feb 03 2015 at 1:47:34 PM Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
<snip>
Do you have a concrete example of what a non (1, 1) array that fails with relaxed strides would look like?
If we used, as right now, the array flags as a first choice point, and only if none is set try to determine it from the strides/dimensions information, I fail to imagine any situation where the end result would be worse than now. I don't think that a little bit of predictable surprising in an advanced functionality is too bad. We could start raising "on the face of ambiguity, we refuse to guess" errors, even for the current behavior you show above, but that is more likely to trip people by not giving them any simple workaround, that it seems to me would be "add a .T if all dimensions are 1" in some particular situations. Or are you thinking of something more serious than a shape mismatch when you write about "breaking current code"?
Yes, I am talking only about wrongly shaped results for some fortran order arrays. A (20, 1) fortran order complex array being viewed as float, will with relaxed strides become a (20, 2) array instead of a (40, 1) one.
That is a limitation of the current implementation too, and happens already whenever relaxed strides are in place. Which is the default for 1.10, right?
Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after all? It should probably be more of a hint of what to do (fortran vs c) when in doubt. "C" would prioritize last axis, "F" the first, and we could even add a "raise" option to have it fail if the axis cannot be inferred from the strides and shape. Current behavior would is equivalent to what "C" would do.
Jaime
IMHO, the best option would be something like this:
- When changing to a type with smaller itemsize, add a new axis after the others so the resulting array is C contiguous (unless a different axis is specified by a keyword argument). The advantage here is that if you index the new view using the old indices for an entry, you get an array showing its representation in the new type. - No shape change for views with the same itemsize - When changing to a type with a larger itemsize, collapse along the last axis unless a different axis is specified, throwing an error if the axis specified does not match the axis specified.
My only concern with adding a new axis, backwards compatibility aside, is that you would not know wether to keep or discard the resulting size 1 dimension when taking a view of the view. We could reuse the keepdims terminology from ufuncs, though. So the current behavior would remain unchanged if you set axis=None, keepdims=True, and we could transition to a more reasonable default with axis=-1, keepdims=False over a few releases with adequate warnings. In my mind, when expanding, the axis would not indicate where to place the new axis, but which axis to collapse over, that is the hard part to figure out!If you want something else, it is only a call to rollaxis away. Perhaps we could also give keepdims a meaning when expanding, as to whether to add a new axis at the end, or change the size of the chosen dimension. I don't know, there may be an actual interface hidden somewhere here, but still needs some cooking to fully figure it out. Jaime
The last point essentially is just adding an axis argument. I like that idea because it gives users the ability to do all that the array's memory layout allows in a clear and concise way. Throwing an error if the default axis doesn't work would be a good way to prevent strange bugs from happening when the default behavior is expected.
The first point would be a break in backwards compatibility, so I'm not sure if it's feasible at this point. The advantage would be that all all arrays returned when using this functionality would be contiguous along the last axis. The shape of the new array would be independent of the memory layout of the original one. This would also be a much cleaner way to ensure that views of a different type are always reversible while still allowing for relaxed strides.
Either way, thanks for looking into this. It's a great feature to have available.
-Ian Henriksen
If there are any real loopholes in expanding this functionality, then lets not do it, but we know we have at least one user unsatisfied with the current performance, so I really think it is worth trying. Plus, I'll admit to that, messing around with some of these stuff deep inside the guts of the beast is lots of fun! ;)
I do not think there are loopholes with expanding this functionality. I think there have regressions when we put relaxed strides to on, because suddenly the fortran order array might be expanded along a 1-sized axis, because it is also C order. So I wonder if we can fix these regressions and at the same time maybe provide a more intuitive approach then using the memory order blindly....
- Sebastian
Jaime
- Sebastian
> > Jaime > > > > - Sebastian > > > > > I guess the main consideration for this is that we > may be > > stuck with > > stuff b/c of backwards compatibility. Can you maybe > say a > > little bit > > about what is allowed now, and what constraints that > puts on > > things? > > E.g. are we already grovelling around in strides and > picking > > random > > dimensions in some cases? > > > > > > Just to restate it: right now we only allow new views if the > array is > > globally contiguous, so either along the first or last > dimension. > > > > > > Jaime > > > > > > -n > > > > -- > > Nathaniel J. Smith > > Postdoctoral researcher - Informatics - University > of > > Edinburgh > > http://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > -- > > (\__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale > en sus > > planes de dominación mundial. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
Hello again, I also have a minor code comment: In get_object_offsets you iterate over dtype.fields.values(). Be careful, because dtype.fields also includes the field titles. For example this fails: dta = np.dtype([(('a', 'title'), 'O'), ('b', 'O'), ('c', 'i1')]) dtb = np.dtype([('a', 'O'), ('b', 'O'), ('c', 'i1')]) assert dtype_view_is_safe(dta, dtb) I've seen two strategies in the numpy code to work around this. One is to to skip entries that are titles, like this: for key,field in dtype.fields.iteritems(): if len(field) == 3 and field[2] == key: #detect titles continue #do something You can find all examples that do this by grepping NPY_TITLE_KEY in the numpy source. The other (more popular) strategy is to iterate over dtype.names. You can find all examples of this by grepping for names_size. I don't know the history of it, but it looks to me like "titles" in dtypes are an obsolete feature. Are they actually used anywhere? Allan On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote:
HI all,
There has been some recent discussion going on on the limitations that numpy imposes to taking views of an array with a different dtype.
As of right now, you can basically only take a view of an array if it has no Python objects and neither the old nor the new dtype are structured. Furthermore, the array has to be either C or Fortran contiguous.
This seem to be way too strict, but the potential for disaster getting a loosening of the restrictions wrong is big, so it should be handled with care.
Allan Haldane and myself have been looking into this separately and discussing some of the details over at github, and we both think that the only true limitation that has to be imposed is that the offsets of Python objects within the new and old dtypes remain compatible. I have expanded Allan's work from here:
https://github.com/ahaldane/numpy/commit/e9ca367
to make it as flexible as I have been able. An implementation of the algorithm in Python, with a few tests, can be found here:
https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py
I would appreciate getting some eyes on it for correctness, and to make sure that it won't break with some weird dtype.
I am also trying to figure out what the ground rules for stride and shape conversions when taking a view with a different dtype should be. I submitted a PR (gh-5508) a couple for days ago working on that, but I am not so sure that the logic is completely sound. Again, to get more eyes on it, I am going to reproduce my thoughts here on the hope of getting some feedback.
The objective would be to always allow a view of a different dtype (given that the dtypes be compatible as described above) to be taken if:
* The itemsize of the dtype doesn't change. * The itemsize changes, but the array being viewed is the result of slicing and transposing axes of a contiguous array, and it is still contiguous, defined as stride == dtype.itemsize, along its smallest-strided dimension, and the itemsize of the newtype exactly divides the size of that dimension. * Ideally taking a view should be a reversible process, i.e. if oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you back a view of arr with the same original shape, strides and dtype.
This last point can get tricky if the minimal stride dimension has size 1, as there could be several of those, e.g.:
>>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) >>> a.flags.contiguous False >>> a.shape (3, 1, 2) >>> a.strides # the stride of the size 1 dimension could be anything, ignore it! (32, 8, 8)
b = a.view(complex) # this fails right now, but should work >>> b.flags.contiguous False >>> b.shape (3, 1, 1) >>> b.strides # the stride of the size 1 dimensions could be anything, ignore them! (32, 16, 16)
c = b.view(float) # which of the two size 1 dimensions should we expand?
"In the face of ambiguity refuse the temptation to guess" dictates that last view should raise an error, unless we agree and document some default. Any thoughts?
Then there is the endless complication one could get into with arrays created with as_strided. I'm not smart enough to figure when and when not those could work, but am willing to retake the discussion if someone wiser si interested.
With all these in mind, my proposal for the new behavior is that taking a view of an array with a different dtype would require:
1. That the newtype and oldtype be compatible, as defined by the algorithm checking object offsets linked above. 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, make it happen! 3. If the array is C/Fortran contiguous, check that the size in bytes of the last/first dimension is evenly divided by newtype.itemsize. If it does, go for it. 4. For non-contiguous arrays: 1. Ignoring dimensions of size 1, check that no stride is smaller than either oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided product, sorry, can't do it! 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == oldtype.itemsize 1. If found, check that it is the only one with that stride, that it is the minimal stride, and that the size in bytes of that dimension is evenly divided by newitem,itemsize. 2. If none is found, check if there is a size 1 dimension that is also unique (unless we agree on a default, as mentioned above) and that newtype.itemsize evenly divides oldtype.itemsize.
Apologies for the long, dense content, but any thought or comments are very welcome.
Jaime
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I find it useful to be able to view a simple 1D contiguous array of complex as float (alternative real and imag), and also the do the reverse.
participants (6)
-
Allan Haldane
-
Ian Henriksen
-
Jaime Fernández del Río
-
Nathaniel Smith
-
Neal Becker
-
Sebastian Berg