[Numpy-discussion] Views of a different dtype

29 Jan 2015

      HI all,

There has been some recent discussion going on on the limitations that
numpy imposes to taking views of an array with a different dtype.

As of right now, you can basically only take a view of an array if it has
no Python objects and neither the old nor the new dtype are structured.
Furthermore, the array has to be either C or Fortran contiguous.

This seem to be way too strict, but the potential for disaster getting a
loosening of the restrictions wrong is big, so it should be handled with
care.

Allan Haldane and myself have been looking into this separately and
discussing some of the details over at github, and we both think that the
only true limitation that has to be imposed is that the offsets of Python
objects within the new and old dtypes remain compatible. I have expanded
Allan's work from here:

https://github.com/ahaldane/numpy/commit/e9ca367

to make it as flexible as I have been able. An implementation of the
algorithm in Python, with a few tests, can be found here:

https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py

I would appreciate getting some eyes on it for correctness, and to make
sure that it won't break with some weird dtype.

I am also trying to figure out what the ground rules for stride and shape
conversions when taking a view with a different dtype should be. I
submitted a PR (gh-5508) a couple for days ago working on that, but I am
not so sure that the logic is completely sound. Again, to get more eyes on
it, I am going to reproduce my thoughts here on the hope of getting some
feedback.

The objective would be to always allow a view of a different dtype (given
that the dtypes be compatible as described above) to be taken if:

   - The itemsize of the dtype doesn't change.
   - The itemsize changes, but the array being viewed is the result of
   slicing and transposing axes of a contiguous array, and it is still
   contiguous, defined as stride == dtype.itemsize, along its smallest-strided
   dimension, and the itemsize of the newtype exactly divides the size of that
   dimension.
   - Ideally taking a view should be a reversible process, i.e. if oldtype
   = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you
   back a view of arr with the same original shape, strides and dtype.

This last point can get tricky if the minimal stride dimension has size 1,
as there could be several of those, e.g.:
...
...
...
a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1)
a.flags.contiguous
False
a.shape
(3, 1, 2)
a.strides  # the stride of the size 1 dimension could be anything,
ignore it!
(32, 8, 8)
b = a.view(complex)  # this fails right now, but should work
...
...
...
b.flags.contiguous
False
b.shape
(3, 1, 1)
b.strides  # the stride of the size 1 dimensions could be anything,
ignore them!
(32, 16, 16)
c = b.view(float)  # which of the two size 1 dimensions should we expand?

"In the face of ambiguity refuse the temptation to guess" dictates that
last view should raise an error, unless we agree and document some default.
Any thoughts?

Then there is the endless complication one could get into with arrays
created with as_strided. I'm not smart enough to figure when and when not
those could work, but am willing to retake the discussion if someone wiser
si interested.

With all these in mind, my proposal for the new behavior is that taking a
view of an array with a different dtype would require:

   1. That the newtype and oldtype be compatible, as defined by the
   algorithm checking object offsets linked above.
   2. If newtype.itemsize == oldtype.itemsize no more checks are needed,
   make it happen!
   3. If the array is C/Fortran contiguous, check that the size in bytes of
   the last/first dimension is evenly divided by newtype.itemsize. If it does,
   go for it.
   4. For non-contiguous arrays:
      1. Ignoring dimensions of size 1, check that no stride is smaller
      than either oldtype.itemsize or newtype.itemsize. If any is found this is
      an as_strided product, sorry, can't do it!
      2. Ignoring dimensions of size 1, find a contiguous dimension, i.e.
      stride == oldtype.itemsize
         1. If found, check that it is the only one with that stride, that
         it is the minimal stride, and that the size in bytes of that
dimension is
         evenly divided by newitem,itemsize.
         2. If none is found, check if there is a size 1 dimension that is
         also unique (unless we agree on a default, as mentioned
above) and that
         newtype.itemsize evenly divides oldtype.itemsize.

Apologies for the long, dense content, but any thought or comments are very
welcome.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.

[Numpy-discussion] Views of a different dtype

Jaime Fernández del Río