PyBUF_SIMPLE/PyBUF_FORMAT: casts to unsigned bytes
Hello, PEP-3118 presumably intended that a PyBUF_SIMPLE request should cast the original buffer's data type to 'B' (unsigned bytes). Here is a one-dimensional example that currently occurs in Lib/test/test_multiprocessing:
import array, io a = array.array('i', [1,2,3,4,5]) m = memoryview(a) m.format 'i' buf = io.BytesIO(bytearray(5*8)) buf.readinto(m)
buf.readinto() calls PyObject_AsWriteBuffer(), which requests a simple buffer from the memoryview, thus casting the 'i' data type to the implied type 'B'. The consumer can see that a cast has occurred because the new buffer's format field is NULL. This seems fine for the one-dimensional case. Numpy currently also allows such casts for multidimensional contiguous and non-contiguous arrays. See below for the examples; I don't want to distract from the main point of the post, which is this: I'm seeking a clear specification for the Python documentation that determines under what circumstances casts to 'B' should succeed. I'll formulate the points as statements for clarity, but in fact they are also questions: 1) An exporter of a C-contiguous array with ndim <= 1 MUST honor a PyBUF_SIMPLE request, setting format, shape and strides to NULL and itemsize to 1. As a corner case, an array with ndim = 0, format = "L" (or other) would also morph into a buffer of unsigned bytes. test_ctypes currently makes use of this. 2) An exporter of a C-contiguous buffer with ndim > 1 MUST honor a PyBUF_SIMPLE request, setting format, shape, and strides to NULL and itemsize to 1. 3) An exporter of a buffer that is not C-contiguous MUST raise BufferError in response to a PyBUF_SIMPLE request. Why am I looking for such rigid rules? The problem with memoryview is that it has to act as a re-exporter itself. For several reasons (performance of chained memoryviews, garbage collection, early release, etc.) it has been decided that the new memoryview object has a managed buffer that takes a snapshot of the original exporter's buffer (See: http://bugs.python.org/issue10181). Now, since getbuffer requests to the memoryview object cannot be redirected to the original object, strict rules are needed for memory_getbuf(). Could you agree with these rules? Point 2) isn't clear from the PEP itself. I assumed it because Numpy currently allows it, and it appears harmless. Stefan Krah Examples: ========= Cast a multidimensional contiguous array: ----------------------------------------- I think itemsize in the result should be 1. [_testbuffer.ndarray is from http://hg.python.org/features/pep-3118#memoryview]
from _testbuffer import * from numpy import * from _testbuffer import ndarray as pyarray
exporter = ndarray(shape=[3,4], dtype="L") # Issue a PyBUF_SIMPLE request to 'exporter' and act as a re-exporter: x = pyarray(exporter, getbuf=PyBUF_SIMPLE) x.len 96 x.shape () x.strides () x.format '' x.itemsize # I think this should be 1, not 8. 8
Cast a multidimensional non-contiguous array: --------------------------------------------- This is clearly not right, since y.buf points to a location that the consumer cannot handle without shape and strides.
nd = ndarray(buffer=bytearray(96), shape=[3,4], dtype="L") [182658 refs] exporter = nd[::-1, ::-2] [182661 refs] exporter array([[0, 0], [0, 0], [0, 0]], dtype=uint64) [182659 refs] y = pyarray(exporter, getbuf=PyBUF_SIMPLE) [182665 refs] y.len 48 [182666 refs] y.strides () [182666 refs] y.shape () [182666 refs] y.format '' [182666 refs] y.itemsize 8 [182666 refs]
(sorry for the top-post, no way around it) Under 2), would it make sense to also export the contents of a Fortran-contiguous buffer as a raw byte stream? I was just the other week writing code to serialize an array in Fortran order to a binary stream. OTOH I could easily serialize its transpose for the same effect. Just something to think about. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Stefan Krah <stefan-usenet@bytereef.org> wrote: Hello, PEP-3118 presumably intended that a PyBUF_SIMPLE request should cast the original buffer's data type to 'B' (unsigned bytes). Here is a one-dimensional example that currently occurs in Lib/test/test_multiprocessing: >>> import array, io >>> a = array.array('i', [1,2,3,4,5]) >>> m = memoryview(a) >>> m.format 'i' >>> buf = io.BytesIO(bytearray(5*8)) >>> buf.readinto(m) buf.readinto() calls PyObject_AsWriteBuffer(), which requests a simple buffer from the memoryview, thus casting the 'i' data type to the implied type 'B'. The consumer can see that a cast has occurred because the new buffer's format field is NULL. This seems fine for the one-dimensional case. Numpy currently also allows such casts for multidimensional contiguous and non-contiguous arrays. See below for the examples; I don't want to distract from the main point of the post, which is this: I'm seeking a clear specification for the Python documentation that determines under what circumstances casts to 'B' should succeed. I'll formulate the points as statements for clarity, but in fact they are also questions: 1) An exporter of a C-contiguous array with ndim <= 1 MUST honor a PyBUF_SIMPLE request, setting format, shape and strides to NULL and itemsize to 1. As a corner case, an array with ndim = 0, format = "L" (or other) would also morph into a buffer of unsigned bytes. test_ctypes currently makes use of this. 2) An exporter of a C-contiguous buffer with ndim > 1 MUST honor a PyBUF_SIMPLE request, setting format, shape, and strides to NULL and itemsize to 1. 3) An exporter of a buffer that is not C-contiguous MUST raise BufferError in response to a PyBUF_SIMPLE request. Why am I looking for such rigid rules? The problem with memoryview is that it has to act as a re-exporter itself. For several reasons (performance of chained memoryviews, garbage collection, early release, etc.) it has been decided that the new memoryview object has a managed buffer that takes a snapshot of the original exporter's buffer (See: http://bugs.python.org/issue10181). Now, since getbuffer requests to the memoryview object cannot be redirected to the original object, strict rules are needed for memory_getbuf(). Could you agree with these rules? Point 2) isn't clear from the PEP itself. I assumed it because Numpy currently allows it, and it appears harmless. Stefan Krah Examples: ========= Cast a multidimensional contiguous array:_____________________________________________ I think itemsize in the result should be 1. [_testbuffer.ndarray is from http://hg.python.org/features/pep-3118#memoryview] >>> from _testbuffer import * >>> from numpy import * >>> from _testbuffer import ndarray as pyarray >>> >>> exporter = ndarray(shape=[3,4], dtype="L") # Issue a PyBUF_SIMPLE request to 'exporter' and act as a re-exporter: >>> x = pyarray(exporter, getbuf=PyBUF_SIMPLE) >>> x.len 96 >>> x.shape () >>> x.strides () >>> x.format '' >>> x.itemsize # I think this should be 1, not 8. 8 Cast a multidimensional non-contiguous array:_____________________________________________ This is clearly not right, since y.buf points to a location that the consumer cannot handle without shape and strides. >>> nd = ndarray(buffer=bytearray(96), shape=[3,4], dtype="L") [182658 refs] >>> exporter = nd[::-1, ::-2] [182661 refs] >>> exporter array([[0, 0], [0, 0], [0, 0]], dtype=uint64) [182659 refs] >>> y = pyarray(exporter, getbuf=PyBUF_SIMPLE) [182665 refs] >>> y.len 48 [182666 refs] >>> y.strides () [182666 refs] >>> y.shape () [182666 refs] >>> y.format '' [182666 refs] >>> y.itemsize 8 [182666 refs]_____________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
Under 2), would it make sense to also export the contents of a Fortran-contiguous buffer as a raw byte stream? I was just the other week writing code to serialize an array in Fortran order to a binary stream.
Probably, since it works now and people might have gotten used to it. It breaks the current hierarchy of requests though: PyBUF_INDIRECT -> suboffsets + strides (PIL-style) PyBUF_STRIDES -> strides (NumPy-style) PyBUF_ND -> C-contiguous PyBUF_SIMPLE -> cast from C or Fortran contiguous to unsigned bytes The last one would be a step up in complexity again. If Fortran contiguous weren't allowed, one could assume that all buffers below (and including) PyBUF_ND are C-contiguous. It is still not clear to me what value itemsize should have if if PyBUF_FORMAT is not given. As I see it, the rules are [1]: Request without PyBUF_FORMAT -> view.format must be NULL -> 'B'. Then itemsize = 'number of bytes implied by the format' = 1. This would work for viewing contiguous buffers as byte streams, but what about non-contiguous buffers? PyBUF_STRIDED: (PyBUF_STRIDES | PyBUF_WRITABLE) How should the buffer be used if itemsize is set to 1? For example, it seems impossible to implement tobytes() if the real itemsize is missing. Travis, if you have time, it would be very nice to have your input on this one, too. Stefan Krah [1] format: "a NULL-terminated format-string (following the struct-style syntax including extensions) indicating what is in each element of memory. The number of elements is len / itemsize, where itemsize is the number of bytes implied by the format. This can be NULL which implies standard unsigned bytes ("B")."
Stefan Krah <stefan-usenet@bytereef.org> wrote:
Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
Under 2), would it make sense to also export the contents of a Fortran-contiguous buffer as a raw byte stream? I was just the other week writing code to serialize an array in Fortran order to a binary stream.
Probably, since it works now and people might have gotten used to it.
There are two other considerations why a "simple" view should be C-contiguous: 1) memoryview.tobytes() converts a Fortran-contiguous buffer to the C layout. I would find it odd if simple_view.tobytes() is different from the raw memory. 2) There is an initiative (with very broad support) to add a cast method to memoryview. The following casts are allowed: - any ND (C-contiguous) -> 1D bytes - any 1D bytes -> ND (C-contiguous) This is how it looks (in my private repo): >>> from _testbuffer import * >>> nd = ndarray(list(range(12)), shape=[3,4], format="Q") >>> m1 = memoryview(nd) >>> m2 = m1.cast('B') >>> m3 = m2.cast('Q', shape=[3,4]) >>> m4 = m2.cast('Q', shape=[2,2,3]) To summarize, I think it will be more consistent if an implicit cast via PyBUF_SIMPLE also disallows Fortran arrays. Unless there are waves of protest here... ;) Stefan Krah
participants (2)
-
Dag Sverre Seljebotn
-
Stefan Krah