[Cython] [RFC PATCH] add read-only buffer support to typed memoryviews

Richard Hansen rhansen at bbn.com
Mon Feb 2 01:52:34 CET 2015


This patch is a very rough attempt at adding support for read-only
buffer objects to typed memoryviews.  It is incomplete, has bugs, and
is very dirty.  I'm positing it anyway to solicit help in figuring out
the right way to add read-only buffer support.  The 'XXX' comments
mark issues I'm not sure about.

The problem:

The following example function raises an exception when passed a bytes
object or any other read-only buffer object:

    cpdef object printbuf(unsigned char[:] buf):
        chars = [chr(x) for x in buf]
        print(repr(''.join(chars)))

This is what happens:

    $ python -c 'import test; test.printbuf("test\0ing")'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "test.pyx", line 1, in test.printbuf (test.c:1417)
      File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (test.c:6795)
      File "stringsource", line 321, in View.MemoryView.memoryview.__cinit__ (test.c:3341)
    BufferError: Object is not writable.

The exception is raised because the code generated for typed
memoryviews always passes the PyBUF_WRITABLE flag to
PyObject_GetBuffer(), and "test\0ing" is a read-only bytes object.
The bytes class raises an exception when it sees the PyBUF_WRITABLE
flag.

In this example case, there's no need to pass PyBUF_WRITABLE, and if
it wasn't passed then the code would work as expected.

The approach this patch takes is:

  * Never pass PyBUF_WRITABLE to PyObject_GetBuffer() just in case the
    underlying object is read-only.
  * If it turns out that we do need write access to the buffer (e.g.,
    to handle 'buf[0] = x'), then check to see if the Py_buffer
    structure's readonly member is false.  If so, it's safe to write.
    Otherwise, raise a BufferError exception ("read-only object").

This approach is not ideal because it doesn't support copy-on-write
objects or similarly complicated buffer objects, but a better approach
would be a more invasive change.  See the comments I added in
check_buffer_writable().

Feedback would be appreciated.

Thanks,
Richard
---
 Cython/Compiler/ExprNodes.py  | 100 ++++++++++++++++++++++++++++++++++++++++++
 Cython/Compiler/MemoryView.py |  12 ++---
 Cython/Utility/MemoryView.pyx |  14 +++++-
 tests/memoryview/memslice.pyx |  12 ++---
 4 files changed, 124 insertions(+), 14 deletions(-)

diff --git a/Cython/Compiler/ExprNodes.py b/Cython/Compiler/ExprNodes.py
index ac28d03..13017a9 100644
--- a/Cython/Compiler/ExprNodes.py
+++ b/Cython/Compiler/ExprNodes.py
@@ -3642,8 +3642,106 @@ class IndexNode(ExprNode):
                 self.extra_index_params(code),
                 code.error_goto(self.pos)))
 
+    def check_buffer_writable(self, code):
+        """Raise BufferError if the Py_buffer struct's readonly flag is set"""
+
+        # XXX This is called from:
+        #   * generate_buffer_setitem_code()
+        #   * generate_memoryviewslice_setslice_code()
+        #   * generate_memoryviewslice_assign_scalar_code()
+        # Are these sufficient?  Is there another function called to
+        # generate code to directly modify a buffer via Cython syntax?
+
+        # XXX is it safe to call self.buffer_entry() multiple times?
+        buffer_entry = self.buffer_entry()
+
+        # XXX is this the right way to check to see if the object
+        # being modified is a memoryview slice object and not some
+        # other object type?
+        if not buffer_entry.type.is_buffer:
+
+            # XXX Currently the PyBUF_WRITABLE flag is never passed
+            # when calling PyObject_GetBuffer() on the underlying
+            # buffer object, so we never asked the buffer object to
+            # provide us with a writable chunk of memory.  Even if the
+            # Py_buffer struct's readonly flag is unset, might it be
+            # inappropriate to write to the buffer since we never
+            # passed PyBUF_WRITABLE?  The Python 3 documentation seems
+            # to imply that this is OK, but it's not entirely clear.
+
+            # XXX This code simply checks to see whether the Py_buffer
+            # view's readonly flag is unset.  This design does not
+            # support copy-on-write or other similarly complicated
+            # buffer objects.
+            #
+            # For example, imagine a class that does copy-on-write.
+            # Suppose there is an instance of that type, and a copy of
+            # it is made.  The copy hasn't been modified yet, so it
+            # shares a buffer with the original object.  If
+            # PyBUF_WRITABLE is not passed to PyObject_GetBuffer(),
+            # the returned Py_buffer simply points to the shared
+            # memory and the readonly flag is set to avoid corrupting
+            # the original object.  If PyBUF_WRITABLE is passed to
+            # PyObject_GetBuffer(), the object makes its own copy of
+            # the memory and the returned Py_buffer points to the
+            # private copy with the readonly flag unset.
+            #
+            # With the current design, writing to the copy in this
+            # scenario would raise a BufferError ("read-only buffer")
+            # even though it is possible to get a writable buffer by
+            # re-calling PyObject_GetBuffer() with PyBUF_WRITABLE.
+            #
+            # An improved design would look like this:
+            #
+            #   * When creating the typed memoryview:
+            #       1. Call PyObject_GetBuffer() to fetch a Py_buffer
+            #          view into the object, but do not pass
+            #          PyBUF_WRITABLE.
+            #       2. Save the Py_buffer view somewhere.
+            #       3. Save the flags that were passed to
+            #          PyObject_GetBuffer().
+            #
+            #   * Whenever write access is required:
+            #       1. If the saved flags do not include
+            #          PyBUF_WRITABLE:
+            #            a. Call PyOjbect_GetBuffer() again, but this
+            #               time include the PyBUF_WRITABLE flag.  If
+            #               this raises a BufferError exception, let
+            #               it propagate.
+            #            b. Call PyBuffer_Release() on the Py_buffer
+            #               view originally saved when the typed
+            #               memoryview was created.
+            #            c. Save the new Py_buffer view.
+            #            d. Save the new flags.
+            #       2. Assert that the Py_buffer view's readonly flag
+            #          is not set
+
+            # XXX is there a better way to get the cname of the
+            # __Pyx_memviewslice variable?
+            if self.is_temp:
+                # buffer_entry.cname is None in this case, so we have
+                # to get the name of the variable some other way
+                cname = self.temp_code
+            else:
+                cname = buffer_entry.cname
+            assert cname is not None
+
+            # XXX is this the right way to test if the Py_buffer's
+            # readonly flag is set?
+            code.putln('if (unlikely(%s.memview->view.readonly)) {' % (cname,))
+
+            # XXX is this the right way to raise an exception?
+            code.putln('PyErr_Format(PyExc_BufferError, "read-only buffer");')
+            code.putln(code.error_goto(self.pos))
+            code.putln('}')
+        else:
+            # XXX what do we do here?
+            raise NotImplementedError()
+
     def generate_buffer_setitem_code(self, rhs, code, op=""):
         # Used from generate_assignment_code and InPlaceAssignmentNode
+        self.check_buffer_writable(code)
+
         buffer_entry, ptrexpr = self.buffer_lookup_code(code)
 
         if self.buffer_type.dtype.is_pyobject:
@@ -3834,11 +3932,13 @@ class IndexNode(ExprNode):
 
     def generate_memoryviewslice_setslice_code(self, rhs, code):
         "memslice1[...] = memslice2 or memslice1[:] = memslice2"
+        self.check_buffer_writable(code)
         from . import MemoryView
         MemoryView.copy_broadcast_memview_src_to_dst(rhs, self, code)
 
     def generate_memoryviewslice_assign_scalar_code(self, rhs, code):
         "memslice1[...] = 0.0 or memslice1[:] = 0.0"
+        self.check_buffer_writable(code)
         from . import MemoryView
         MemoryView.assign_scalar(self, rhs, code)
 
diff --git a/Cython/Compiler/MemoryView.py b/Cython/Compiler/MemoryView.py
index 8e2f58b..9e75267 100644
--- a/Cython/Compiler/MemoryView.py
+++ b/Cython/Compiler/MemoryView.py
@@ -32,12 +32,12 @@ def concat_flags(*flags):
 
 format_flag = "PyBUF_FORMAT"
 
-memview_c_contiguous = "(PyBUF_C_CONTIGUOUS | PyBUF_FORMAT | PyBUF_WRITABLE)"
-memview_f_contiguous = "(PyBUF_F_CONTIGUOUS | PyBUF_FORMAT | PyBUF_WRITABLE)"
-memview_any_contiguous = "(PyBUF_ANY_CONTIGUOUS | PyBUF_FORMAT | PyBUF_WRITABLE)"
-memview_full_access = "PyBUF_FULL"
-#memview_strided_access = "PyBUF_STRIDED"
-memview_strided_access = "PyBUF_RECORDS"
+memview_c_contiguous = "(PyBUF_C_CONTIGUOUS | PyBUF_FORMAT)"
+memview_f_contiguous = "(PyBUF_F_CONTIGUOUS | PyBUF_FORMAT)"
+memview_any_contiguous = "(PyBUF_ANY_CONTIGUOUS | PyBUF_FORMAT)"
+memview_full_access = "PyBUF_FULL_RO"
+#memview_strided_access = "PyBUF_STRIDED_RO"
+memview_strided_access = "PyBUF_RECORDS_RO"
 
 MEMVIEW_DIRECT = '__Pyx_MEMVIEW_DIRECT'
 MEMVIEW_PTR    = '__Pyx_MEMVIEW_PTR'
diff --git a/Cython/Utility/MemoryView.pyx b/Cython/Utility/MemoryView.pyx
index f7ecd58..4344978 100644
--- a/Cython/Utility/MemoryView.pyx
+++ b/Cython/Utility/MemoryView.pyx
@@ -61,6 +61,7 @@ cdef extern from *:
         PyBUF_STRIDES
         PyBUF_INDIRECT
         PyBUF_RECORDS
+        PyBUF_RECORDS_RO
 
     ctypedef struct __Pyx_TypeInfo:
         pass
@@ -367,6 +368,9 @@ cdef class memoryview(object):
             return self.convert_item_to_object(itemp)
 
     def __setitem__(memoryview self, object index, object value):
+        if self.view.readonly:
+            raise BufferError('read-only buffer')
+
         have_slices, index = _unellipsify(index, self.view.ndim)
 
         if have_slices:
@@ -466,6 +470,9 @@ cdef class memoryview(object):
 
     @cname('getbuffer')
     def __getbuffer__(self, Py_buffer *info, int flags):
+        if (flags & PyBUF_WRITABLE) and self.view.readonly:
+            raise BufferError("read-only buffer")
+
         if flags & PyBUF_STRIDES:
             info.shape = self.view.shape
         else:
@@ -490,7 +497,7 @@ cdef class memoryview(object):
         info.ndim = self.view.ndim
         info.itemsize = self.view.itemsize
         info.len = self.view.len
-        info.readonly = 0
+        info.readonly = self.view.readonly
         info.obj = self
 
     __pyx_getbuffer = capsule(<void *> &__pyx_memoryview_getbuffer, "getbuffer(obj, view, flags)")
@@ -981,7 +988,10 @@ cdef memoryview_fromslice({{memviewslice_name}} memviewslice,
     (<__pyx_buffer *> &result.view).obj = Py_None
     Py_INCREF(Py_None)
 
-    result.flags = PyBUF_RECORDS
+    if (<memoryview>result.from_slice.memview).flags & PyBUF_WRITABLE:
+        result.flags = PyBUF_RECORDS
+    else:
+        result.flags = PyBUF_RECORDS_RO
 
     result.view.shape = <Py_ssize_t *> result.from_slice.shape
     result.view.strides = <Py_ssize_t *> result.from_slice.strides
diff --git a/tests/memoryview/memslice.pyx b/tests/memoryview/memslice.pyx
index 417bcbc..62f8782 100644
--- a/tests/memoryview/memslice.pyx
+++ b/tests/memoryview/memslice.pyx
@@ -480,7 +480,7 @@ def strided(int[:] buf):
     released A
     2
     >>> [str(x) for x in A.recieved_flags] # Py2/3
-    ['FORMAT', 'ND', 'STRIDES', 'WRITABLE']
+    ['FORMAT', 'ND', 'STRIDES']
 
     Check that the suboffsets were patched back prior to release.
     >>> A.release_ok
@@ -495,7 +495,7 @@ def c_contig(int[::1] buf):
     >>> c_contig(A)
     2
     >>> [str(x) for x in A.recieved_flags]
-    ['FORMAT', 'ND', 'STRIDES', 'C_CONTIGUOUS', 'WRITABLE']
+    ['FORMAT', 'ND', 'STRIDES', 'C_CONTIGUOUS']
     """
     return buf[2]
 
@@ -508,7 +508,7 @@ def c_contig_2d(int[:, ::1] buf):
     >>> c_contig_2d(A)
     7
     >>> [str(x) for x in A.recieved_flags]
-    ['FORMAT', 'ND', 'STRIDES', 'C_CONTIGUOUS', 'WRITABLE']
+    ['FORMAT', 'ND', 'STRIDES', 'C_CONTIGUOUS']
     """
     return buf[1, 3]
 
@@ -519,7 +519,7 @@ def f_contig(int[::1, :] buf):
     >>> f_contig(A)
     2
     >>> [str(x) for x in A.recieved_flags]
-    ['FORMAT', 'ND', 'STRIDES', 'F_CONTIGUOUS', 'WRITABLE']
+    ['FORMAT', 'ND', 'STRIDES', 'F_CONTIGUOUS']
     """
     return buf[0, 1]
 
@@ -532,7 +532,7 @@ def f_contig_2d(int[::1, :] buf):
     >>> f_contig_2d(A)
     7
     >>> [str(x) for x in A.recieved_flags]
-    ['FORMAT', 'ND', 'STRIDES', 'F_CONTIGUOUS', 'WRITABLE']
+    ['FORMAT', 'ND', 'STRIDES', 'F_CONTIGUOUS']
     """
     return buf[3, 1]
 
@@ -990,7 +990,7 @@ def bufdefaults1(int[:] buf):
     acquired A
     released A
     >>> [str(x) for x in A.recieved_flags]
-    ['FORMAT', 'ND', 'STRIDES', 'WRITABLE']
+    ['FORMAT', 'ND', 'STRIDES']
     """
     pass
 
-- 
2.2.2



More information about the cython-devel mailing list