Hello again, I also have a minor code comment: In get_object_offsets you iterate over dtype.fields.values(). Be careful, because dtype.fields also includes the field titles. For example this fails: dta = np.dtype([(('a', 'title'), 'O'), ('b', 'O'), ('c', 'i1')]) dtb = np.dtype([('a', 'O'), ('b', 'O'), ('c', 'i1')]) assert dtype_view_is_safe(dta, dtb) I've seen two strategies in the numpy code to work around this. One is to to skip entries that are titles, like this: for key,field in dtype.fields.iteritems(): if len(field) == 3 and field[2] == key: #detect titles continue #do something You can find all examples that do this by grepping NPY_TITLE_KEY in the numpy source. The other (more popular) strategy is to iterate over dtype.names. You can find all examples of this by grepping for names_size. I don't know the history of it, but it looks to me like "titles" in dtypes are an obsolete feature. Are they actually used anywhere? Allan On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote:

HI all,

There has been some recent discussion going on on the limitations that numpy imposes to taking views of an array with a different dtype.

As of right now, you can basically only take a view of an array if it has no Python objects and neither the old nor the new dtype are structured. Furthermore, the array has to be either C or Fortran contiguous.

This seem to be way too strict, but the potential for disaster getting a loosening of the restrictions wrong is big, so it should be handled with care.

Allan Haldane and myself have been looking into this separately and discussing some of the details over at github, and we both think that the only true limitation that has to be imposed is that the offsets of Python objects within the new and old dtypes remain compatible. I have expanded Allan's work from here:

https://github.com/ahaldane/numpy/commit/e9ca367

to make it as flexible as I have been able. An implementation of the algorithm in Python, with a few tests, can be found here:

https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py

I would appreciate getting some eyes on it for correctness, and to make sure that it won't break with some weird dtype.

I am also trying to figure out what the ground rules for stride and shape conversions when taking a view with a different dtype should be. I submitted a PR (gh-5508) a couple for days ago working on that, but I am not so sure that the logic is completely sound. Again, to get more eyes on it, I am going to reproduce my thoughts here on the hope of getting some feedback.

The objective would be to always allow a view of a different dtype (given that the dtypes be compatible as described above) to be taken if:

* The itemsize of the dtype doesn't change. * The itemsize changes, but the array being viewed is the result of slicing and transposing axes of a contiguous array, and it is still contiguous, defined as stride == dtype.itemsize, along its smallest-strided dimension, and the itemsize of the newtype exactly divides the size of that dimension. * Ideally taking a view should be a reversible process, i.e. if oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you back a view of arr with the same original shape, strides and dtype.

This last point can get tricky if the minimal stride dimension has size 1, as there could be several of those, e.g.:

>>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) >>> a.flags.contiguous False >>> a.shape (3, 1, 2) >>> a.strides # the stride of the size 1 dimension could be anything, ignore it! (32, 8, 8)

b = a.view(complex) # this fails right now, but should work >>> b.flags.contiguous False >>> b.shape (3, 1, 1) >>> b.strides # the stride of the size 1 dimensions could be anything, ignore them! (32, 16, 16)

c = b.view(float) # which of the two size 1 dimensions should we expand?

"In the face of ambiguity refuse the temptation to guess" dictates that last view should raise an error, unless we agree and document some default. Any thoughts?

Then there is the endless complication one could get into with arrays created with as_strided. I'm not smart enough to figure when and when not those could work, but am willing to retake the discussion if someone wiser si interested.

With all these in mind, my proposal for the new behavior is that taking a view of an array with a different dtype would require:

1. That the newtype and oldtype be compatible, as defined by the algorithm checking object offsets linked above. 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, make it happen! 3. If the array is C/Fortran contiguous, check that the size in bytes of the last/first dimension is evenly divided by newtype.itemsize. If it does, go for it. 4. For non-contiguous arrays: 1. Ignoring dimensions of size 1, check that no stride is smaller than either oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided product, sorry, can't do it! 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == oldtype.itemsize 1. If found, check that it is the only one with that stride, that it is the minimal stride, and that the size in bytes of that dimension is evenly divided by newitem,itemsize. 2. If none is found, check if there is a size 1 dimension that is also unique (unless we agree on a default, as mentioned above) and that newtype.itemsize evenly divides oldtype.itemsize.

Apologies for the long, dense content, but any thought or comments are very welcome.

Jaime

-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion