"expected a single-segment buffer object"
Hi,
When trying to construct an ndarray, I sometimes run into the
more-or-less mystifying error "expected a single-segment buffer
object":
Out[54]: (0, 16, 8)
In [55]: A=np.zeros(2); A=A[np.newaxis,...];
np.ndarray(strides=A.strides,shape=A.shape,buffer=A,dtype=A.dtype)
---------------------------------------------------------------------------
On Wed, Jul 9, 2008 at 18:55, Anne Archibald
Hi,
When trying to construct an ndarray, I sometimes run into the more-or-less mystifying error "expected a single-segment buffer object":
Out[54]: (0, 16, 8) In [55]: A=np.zeros(2); A=A[np.newaxis,...]; np.ndarray(strides=A.strides,shape=A.shape,buffer=A,dtype=A.dtype) ---------------------------------------------------------------------------
Traceback (most recent call last) /home/peridot/<ipython console> in <module>()
: expected a single-segment buffer object In [56]: A.strides Out[56]: (0, 8)
That is, when I try to construct an ndarray based on an array with a zero stride, I get this mystifying error. Zero-strided arrays appear naturally when one uses newaxis, but they are valuable in their own right (for example for broadcasting purposes). So it's a bit awkward to have this error appearing when one tries to feed them to ndarray.__new__ as a buffer. I can, I think, work around it by removing all axes with stride zero:
def bufferize(A): idx = [] for v in A.strides: if v==0: idx.append(0) else: idx.append(slice(None,None,None)) return A[tuple(idx)]
Is there any reason for this restriction?
Yes, the buffer interface, at least the subset that ndarray() consumes, requires that all of the data be contiguous in memory. array_as_buffer() checks for that using PyArray_ISONE_SEGMENT(), which looks like this: #define PyArray_ISONESEGMENT(m) (PyArray_NDIM(m) == 0 || \ PyArray_CHKFLAGS(m, NPY_CONTIGUOUS) || \ PyArray_CHKFLAGS(m, NPY_FORTRAN)) Trying to get a buffer object from anything that is neither C- or Fortran-contiguous will fail. E.g. In [1]: from numpy import * In [2]: A = arange(10) In [3]: B = A[::2] In [4]: ndarray(strides=B.strides, shape=B.shape, buffer=B, dtype=B.dtype) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/rkern/today/<ipython console> in <module>() TypeError: expected a single-segment buffer object What is the use case, here? One rarely has to use the ndarray constructor by itself. For example, the result you seem to want from the call you make above can be done just fine with .view(). In [8]: C = B.view(ndarray) In [9]: C Out[9]: array([0, 2, 4, 6, 8]) In [10]: B Out[10]: array([0, 2, 4, 6, 8]) In [11]: C is B Out[11]: False In [12]: B[0] = 10 In [13]: C Out[13]: array([10, 2, 4, 6, 8]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2008/7/9 Robert Kern
Yes, the buffer interface, at least the subset that ndarray() consumes, requires that all of the data be contiguous in memory. array_as_buffer() checks for that using PyArray_ISONE_SEGMENT(), which looks like this:
#define PyArray_ISONESEGMENT(m) (PyArray_NDIM(m) == 0 || \ PyArray_CHKFLAGS(m, NPY_CONTIGUOUS) || \ PyArray_CHKFLAGS(m, NPY_FORTRAN))
Trying to get a buffer object from anything that is neither C- or Fortran-contiguous will fail. E.g.
In [1]: from numpy import *
In [2]: A = arange(10)
In [3]: B = A[::2]
In [4]: ndarray(strides=B.strides, shape=B.shape, buffer=B, dtype=B.dtype) --------------------------------------------------------------------------- TypeError Traceback (most recent call last)
/Users/rkern/today/<ipython console> in <module>()
TypeError: expected a single-segment buffer object
Is this really necessary? What does making this restriction gain? It certainly means that many arrays whose storage is a contiguous block of memory can still not be used (just permute the axes of a 3d array, say; it may even be possible for an array to be in C contiguous order but for the flag not to be set), but how is one to construct exotic slices of an array that is strided in memory? (The real part of a complex array, say.) I suppose one could follow the linked list of .bases up to the original ndarray, which should normally be C- or Fortran-contiguous, then work out the offset, but even this may not always work: what if the original array was constructed with non-C-contiguous strides from some preexisting buffer? If the concern is that this allows users to shoot themselves in the foot, it's worth noting that even with the current setup you can easily fabricate strides and shapes that go outside the allocated part of memory.
What is the use case, here? One rarely has to use the ndarray constructor by itself. For example, the result you seem to want from the call you make above can be done just fine with .view().
I was presenting a simple example. I was actually trying to use zero-strided arrays to implement broadcasting. The code was rather long, but essentially what it was meant to do was generate a view of an array in which an axis of length one had been replaced by an axis of length m with stride zero. (The point of all this was to create a class like vectorize that was suitable for use on, for example, np.linalg.inv().) But I also ran into this problem while writing segmentaxis.py, the code to produce a "matrix" of sliding windows. (See http://www.scipy.org/Cookbook/SegmentAxis .) There I caught the exception and copied the array (unnecessarily) if this came up. Anne
On Wed, Jul 9, 2008 at 21:29, Anne Archibald
2008/7/9 Robert Kern
: Yes, the buffer interface, at least the subset that ndarray() consumes, requires that all of the data be contiguous in memory. array_as_buffer() checks for that using PyArray_ISONE_SEGMENT(), which looks like this:
#define PyArray_ISONESEGMENT(m) (PyArray_NDIM(m) == 0 || \ PyArray_CHKFLAGS(m, NPY_CONTIGUOUS) || \ PyArray_CHKFLAGS(m, NPY_FORTRAN))
Trying to get a buffer object from anything that is neither C- or Fortran-contiguous will fail. E.g.
In [1]: from numpy import *
In [2]: A = arange(10)
In [3]: B = A[::2]
In [4]: ndarray(strides=B.strides, shape=B.shape, buffer=B, dtype=B.dtype) --------------------------------------------------------------------------- TypeError Traceback (most recent call last)
/Users/rkern/today/<ipython console> in <module>()
TypeError: expected a single-segment buffer object
Is this really necessary? What does making this restriction gain? It certainly means that many arrays whose storage is a contiguous block of memory can still not be used (just permute the axes of a 3d array, say; it may even be possible for an array to be in C contiguous order but for the flag not to be set), but how is one to construct exotic slices of an array that is strided in memory? (The real part of a complex array, say.)
Because that's just what a buffer= argument *is*. It is not a place for presenting the starting pointer to exotically-strided memory. Use __array_interface__s to describe the full range of representable memory. See below.
I suppose one could follow the linked list of .bases up to the original ndarray, which should normally be C- or Fortran-contiguous, then work out the offset, but even this may not always work: what if the original array was constructed with non-C-contiguous strides from some preexisting buffer?
If the concern is that this allows users to shoot themselves in the foot, it's worth noting that even with the current setup you can easily fabricate strides and shapes that go outside the allocated part of memory.
What is the use case, here? One rarely has to use the ndarray constructor by itself. For example, the result you seem to want from the call you make above can be done just fine with .view().
I was presenting a simple example. I was actually trying to use zero-strided arrays to implement broadcasting. The code was rather long, but essentially what it was meant to do was generate a view of an array in which an axis of length one had been replaced by an axis of length m with stride zero. (The point of all this was to create a class like vectorize that was suitable for use on, for example, np.linalg.inv().) But I also ran into this problem while writing segmentaxis.py, the code to produce a "matrix" of sliding windows. (See http://www.scipy.org/Cookbook/SegmentAxis .) There I caught the exception and copied the array (unnecessarily) if this came up.
I was about a week ahead of you. See numpy/lib/stride_tricks.py in the trunk. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2008/7/9 Robert Kern
Because that's just what a buffer= argument *is*. It is not a place for presenting the starting pointer to exotically-strided memory. Use __array_interface__s to describe the full range of representable memory. See below.
Aha! Is this stuff documented somewhere?
I was about a week ahead of you. See numpy/lib/stride_tricks.py in the trunk.
Nice! Unfortunately it can't quite do what I want... for the linear algebra I need something that can broadcast all but certain axes. For example, take an array of matrices and an array of vectors. The "array" axes need broadcasting, but you can't broadcast on all axes without (incorrectly) turning the vector into a matrix. I've written a (messy) implementation, but the corner cases are giving me headaches. I'll let you know when I have something that works. Thanks, Anne
On Thu, Jul 10, 2008 at 9:33 AM, Anne Archibald
2008/7/9 Robert Kern
: Because that's just what a buffer= argument *is*. It is not a place for presenting the starting pointer to exotically-strided memory. Use __array_interface__s to describe the full range of representable memory. See below.
Aha! Is this stuff documented somewhere?
I was about a week ahead of you. See numpy/lib/stride_tricks.py in the trunk.
Nice! Unfortunately it can't quite do what I want... for the linear algebra I need something that can broadcast all but certain axes. For example, take an array of matrices and an array of vectors. The "array" axes need broadcasting, but you can't broadcast on all axes without (incorrectly) turning the vector into a matrix. I've written a (messy) implementation, but the corner cases are giving me headaches. I'll let you know when I have something that works.
I think something like a matrix/vector dtype would be another way to go for stacks of matrices and vectors. It would have to be a user defined type to fit into the current type hierarchy for ufuncs, but I think the base machinery is there with the generic inner loops. Chuck
2008/7/10 Charles R Harris
On Thu, Jul 10, 2008 at 9:33 AM, Anne Archibald
wrote: 2008/7/9 Robert Kern
: Because that's just what a buffer= argument *is*. It is not a place for presenting the starting pointer to exotically-strided memory. Use __array_interface__s to describe the full range of representable memory. See below.
Aha! Is this stuff documented somewhere?
I was about a week ahead of you. See numpy/lib/stride_tricks.py in the trunk.
Nice! Unfortunately it can't quite do what I want... for the linear algebra I need something that can broadcast all but certain axes. For example, take an array of matrices and an array of vectors. The "array" axes need broadcasting, but you can't broadcast on all axes without (incorrectly) turning the vector into a matrix. I've written a (messy) implementation, but the corner cases are giving me headaches. I'll let you know when I have something that works.
I think something like a matrix/vector dtype would be another way to go for stacks of matrices and vectors. It would have to be a user defined type to fit into the current type hierarchy for ufuncs, but I think the base machinery is there with the generic inner loops.
There's definitely room for discussion about how such a linear algebra system should ultimately work. In the short term, though, I think it's valuable to create a prototype system, even if much of the looping has to be in python, just to figure out how it should look to the user. For example, I'm not sure whether a matrix/vector dtype would make easy things like indexing operations - fancy indexing spanning both array and matrix axes, for example. dtypes can also be a little impenetrable to use, so even if this is how elementwise linear algebra is ultimately implemented, we may want to provide a different user frontend. My idea for a first sketch was simply to provide (for example) np.elementwise_linalg.dot_mm, that interprets its arguments both as arrays of matrices and yields an array of matrices. A second sketch might be a subclass MatrixArray, which could serve much as Matrix does now, to tell various functions which axes to do linear algebra over and which axes to iterate over. There's also the question of how to make these efficient; one could of course write a C wrapper for each linear algebra function that simply iterates, but your suggestion of using the ufunc machinery with a matrix dtype might be better. Or, if cython acquires sensible polymorphism over numpy dtypes, a cython wrapper might be the way to go. But I think establishing what it should look like to users is a good first step, and I think that's best done with sample implementations. Anne
On Thu, Jul 10, 2008 at 2:25 PM, Anne Archibald
2008/7/10 Charles R Harris
: On Thu, Jul 10, 2008 at 9:33 AM, Anne Archibald < peridot.faceted@gmail.com> wrote:
2008/7/9 Robert Kern
: Because that's just what a buffer= argument *is*. It is not a place for presenting the starting pointer to exotically-strided memory. Use __array_interface__s to describe the full range of representable memory. See below.
Aha! Is this stuff documented somewhere?
I was about a week ahead of you. See numpy/lib/stride_tricks.py in the trunk.
Nice! Unfortunately it can't quite do what I want... for the linear algebra I need something that can broadcast all but certain axes. For example, take an array of matrices and an array of vectors. The "array" axes need broadcasting, but you can't broadcast on all axes without (incorrectly) turning the vector into a matrix. I've written a (messy) implementation, but the corner cases are giving me headaches. I'll let you know when I have something that works.
I think something like a matrix/vector dtype would be another way to go for stacks of matrices and vectors. It would have to be a user defined type to fit into the current type hierarchy for ufuncs, but I think the base machinery is there with the generic inner loops.
There's definitely room for discussion about how such a linear algebra system should ultimately work. In the short term, though, I think it's valuable to create a prototype system, even if much of the looping has to be in python, just to figure out how it should look to the user.
For example, I'm not sure whether a matrix/vector dtype would make easy things like indexing operations - fancy indexing spanning both array and matrix axes,
True enough. And I would expect the ufunc approach to require the sub matrices to be contiguous. In any case, ufuncs aren't ready for this yet, in fact they aren't ready for string types or other numeric types, so there is a lot of work to be done just at that level. for example. dtypes can also be a little
impenetrable to use, so even if this is how elementwise linear algebra is ultimately implemented, we may want to provide a different user frontend.
My idea for a first sketch was simply to provide (for example) np.elementwise_linalg.dot_mm, that interprets its arguments both as arrays of matrices and yields an array of matrices. A second sketch might be a subclass MatrixArray, which could serve much as Matrix does now, to tell various functions which axes to do linear algebra over and which axes to iterate over.
Definitely needs doing. It's hard to see how things work out in practice without some experimentation.
There's also the question of how to make these efficient; one could of course write a C wrapper for each linear algebra function that simply iterates, but your suggestion of using the ufunc machinery with a matrix dtype might be better. Or, if cython acquires sensible polymorphism over numpy dtypes, a cython wrapper might be the way to go. But I think establishing what it should look like to users is a good first step, and I think that's best done with sample implementations.
Amen. Chuck
On Thu, Jul 10, 2008 at 10:33, Anne Archibald
2008/7/9 Robert Kern
: Because that's just what a buffer= argument *is*. It is not a place for presenting the starting pointer to exotically-strided memory. Use __array_interface__s to describe the full range of representable memory. See below.
Aha! Is this stuff documented somewhere?
_The Guide to Numpy_, section 3.1.4. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2008/7/10 Robert Kern
I was about a week ahead of you. See numpy/lib/stride_tricks.py in the trunk.
Robert, this is fantastic! I think people are going to enjoy your talk at SciPy'08. If you want, we could also tutor this in the advanced NumPy session. Cheers Stéfan
participants (4)
-
Anne Archibald
-
Charles R Harris
-
Robert Kern
-
Stéfan van der Walt