Memoryviews should expose the underlying memory address

I have recently been experimenting with the memoryview() built-in and have come to believe that it really needs to expose the 'buf' attribute of the underlying Py_buffer structure as an integer (see PEP 3118). Let me explain.
The whole point of PEP 3118 (as I understand it) is to provide a means for exchanging or sharing array data across different libraries such as numpy, PIL, ctypes, Cython, etc. If you're working with Py_buffer objects at the C level, this works fine. However, if you're working purely in Python, you're only able to get partial information about memory views such as the shape and size. You can't get the actual pointer to the underlying memory (unless I've missed something obvious).
This is unfortunate because it means that you can't write Python code to link memoryviews to other kinds of compiled code that might want to operate on array-oriented data. For example, you can't pass the raw pointer into a function that you've exposed via ctypes. Similarly, you can't pass the pointer into functions you've dynamically compiled using libraries such as LLVM-py. There might be other kinds of applications, but just having that one bit of extra information available would be useful for various advanced programming techniques involving extensions and memory buffers.
Cheers, Dave

2012/9/20 David Beazley dave@dabeaz.com:
I have recently been experimenting with the memoryview() built-in and have come to believe that it really needs to expose the 'buf' attribute of the underlying Py_buffer structure as an integer (see PEP 3118). Let me explain.
The whole point of PEP 3118 (as I understand it) is to provide a means for exchanging or sharing array data across different libraries such as numpy, PIL, ctypes, Cython, etc. If you're working with Py_buffer objects at the C level, this works fine. However, if you're working purely in Python, you're only able to get partial information about memory views such as the shape and size. You can't get the actual pointer to the underlying memory (unless I've missed something obvious).
This is unfortunate because it means that you can't write Python code to link memoryviews to other kinds of compiled code that might want to operate on array-oriented data. For example, you can't pass the raw pointer into a function that you've exposed via ctypes. Similarly, you can't pass the pointer into functions you've dynamically compiled using libraries such as LLVM-py. There might be other kinds of applications, but just having that one bit of extra information available would be useful for various advanced programming techniques involving extensions and memory buffers.
Presumably ctypes should be able to do this conversion for you.

How? I must be missing something very obvious.
Cheers, Dave
On Sep 20, 2012, at 11:48 AM, Benjamin Peterson wrote:
2012/9/20 David Beazley dave@dabeaz.com:
I have recently been experimenting with the memoryview() built-in and have come to believe that it really needs to expose the 'buf' attribute of the underlying Py_buffer structure as an integer (see PEP 3118). Let me explain.
The whole point of PEP 3118 (as I understand it) is to provide a means for exchanging or sharing array data across different libraries such as numpy, PIL, ctypes, Cython, etc. If you're working with Py_buffer objects at the C level, this works fine. However, if you're working purely in Python, you're only able to get partial information about memory views such as the shape and size. You can't get the actual pointer to the underlying memory (unless I've missed something obvious).
This is unfortunate because it means that you can't write Python code to link memoryviews to other kinds of compiled code that might want to operate on array-oriented data. For example, you can't pass the raw pointer into a function that you've exposed via ctypes. Similarly, you can't pass the pointer into functions you've dynamically compiled using libraries such as LLVM-py. There might be other kinds of applications, but just having that one bit of extra information available would be useful for various advanced programming techniques involving extensions and memory buffers.
Presumably ctypes should be able to do this conversion for you.
-- Regards, Benjamin

On 20/09/2012 5:53pm, David Beazley wrote:
How? I must be missing something very obvious.
I would not call it obvious, but you can do
>>> m = memoryview(bytearray(5)) >>> ctypes.addressof(ctypes.c_char.from_buffer(m)) 149979304
However, this only works for writable memoryviews. For read-only memoryviews you could do
>>> obj = ctypes.py_object(m) >>> address = ctypes.c_void_p() >>> length = ctypes.c_ssize_t() >>> ctypes.pythonapi.PyObject_AsReadBuffer(obj, ctypes.byref(address), ctypes.byref(length)) 0 >>> address, length (c_void_p(149979304), c_long(5))

2012/9/20 David Beazley dave@dabeaz.com:
How? I must be missing something very obvious.
If you have some ctypes function that requires a pointer and you pass a memoryview, ctypes should pass the pointer to the raw memory, right?

Well, if it's supposed to do that, it certainly doesn't work for me in 3.3. I get a type error about it wanting a ctypes pointer object. Even if this worked, it still doesn't address the need to get the pointer value possibly for some other purpose such as handling it off to a bunch of code generated via LLVM.
Cheers, Dave
On Sep 20, 2012, at 1:20 PM, Benjamin Peterson wrote:
2012/9/20 David Beazley dave@dabeaz.com:
How? I must be missing something very obvious.
If you have some ctypes function that requires a pointer and you pass a memoryview, ctypes should pass the pointer to the raw memory, right?
-- Regards, Benjamin

Le Sep 20, 2012 à 11:35 AM, David Beazley dave@dabeaz.com a écrit :
Well, if it's supposed to do that, it certainly doesn't work for me in 3.3. I get a type error about it wanting a ctypes pointer object. Even if this worked, it still doesn't address the need to get the pointer value possibly for some other purpose such as handling it off to a bunch of code generated via LLVM.
It seems like there's no reason to need to get the pointer value out as a Python integer. If you are trying to get a pointer from a memoryview into some C code, or into some LLVM generated code, you still need to do the Python int object → C integer-of-some-kind → C pointer type conversion. Better to just go straight from Python memoryview object → C pointer in one supported API call. Isn't this what the y* w* s* format codes are for?
Every time I have something that's a big number and I need to turn it into a pointer, I have to stare at the table in http://en.wikipedia.org/wiki/64_bit#64-bit_data_models for like 30 seconds. I'd rather have some Python API do the staring for me. David, I realize that table is probably permanently visible in the heads-up display that your cybernetic implants afford you, but some of us need to make our way through C code with humbler faculties ;-).
-g

A memory address is a number. I think an integer is fine--if you're working at this level, you're already on your own and expected to know what you're doing. I'd prefer to just get the raw address without yet another level of indirection.
Other parts of the library already do this. For instance array.buffer_info().
Cheers Dave
Sent from cell
On Sep 20, 2012, at 6:16 PM, Glyph glyph@twistedmatrix.com wrote:
Le Sep 20, 2012 à 11:35 AM, David Beazley dave@dabeaz.com a écrit :
Well, if it's supposed to do that, it certainly doesn't work for me in 3.3. I get a type error about it wanting a ctypes pointer object. Even if this worked, it still doesn't address the need to get the pointer value possibly for some other purpose such as handling it off to a bunch of code generated via LLVM.
It seems like there's no reason to need to get the pointer value out as a Python integer. If you are trying to get a pointer from a memoryview into some C code, or into some LLVM generated code, you still need to do the Python int object → C integer-of-some-kind → C pointer type conversion. Better to just go straight from Python memoryview object → C pointer in one supported API call. Isn't this what the y* w* s* format codes are for?
Every time I have something that's a big number and I need to turn it into a pointer, I have to stare at the table in http://en.wikipedia.org/wiki/64_bit#64-bit_data_models for like 30 seconds. I'd rather have some Python API do the staring for me. David, I realize that table is probably permanently visible in the heads-up display that your cybernetic implants afford you, but some of us need to make our way through C code with humbler faculties ;-).
-g

On Fri, Sep 21, 2012 at 9:37 AM, David Beazley dave@dabeaz.com wrote:
A memory address is a number. I think an integer is fine--if you're working at this level, you're already on your own and expected to know what you're doing. I'd prefer to just get the raw address without yet another level of indirection.
Other parts of the library already do this. For instance array.buffer_info().
I'm fine with exposing a memoryview.buffer_address attribute in 3.4. The idea had never come up before, as the idea of using *Python code* (rather than C) to provide the shim between a PEP 3118 exporter and a consumer that doesn't understand that API isn't a use case we had even considered. memoryview has instead been more focused on *interpreting* the contents of exported buffers as ordinary Python objects.
(We already know we still need to define an API to let classes defined in Python implement the buffer API, though)
Cheers, Nick.

On Fri, Sep 21, 2012 at 4:12 PM, Greg Ewing greg.ewing@canterbury.ac.nz wrote:
Nick Coghlan wrote:
I'm fine with exposing a memoryview.buffer_address attribute in 3.4.
What about objects whose buffer address can change when the buffer isn't locked?
Managing the lifecycle issues will be up to the application. If they let the memoryview object go away, then the buffer it references may also go away. This isn't any different from the situation with array.buffer_info() - the address from that is only valid as long as the array object itself is still around.
Cheers, Nick.

On Fri, Sep 21, 2012 at 1:16 AM, Glyph glyph@twistedmatrix.com wrote:
Le Sep 20, 2012 à 11:35 AM, David Beazley dave@dabeaz.com a écrit :
Well, if it's supposed to do that, it certainly doesn't work for me in 3.3. I get a type error about it wanting a ctypes pointer object. Even if this worked, it still doesn't address the need to get the pointer value possibly for some other purpose such as handling it off to a bunch of code generated via LLVM.
It seems like there's no reason to need to get the pointer value out as a Python integer. If you are trying to get a pointer from a memoryview into some C code, or into some LLVM generated code, you still need to do the Python int object → C integer-of-some-kind → C pointer type conversion. Better to just go straight from Python memoryview object → C pointer in one supported API call. Isn't this what the y* w* s* format codes are for?
Every time I have something that's a big number and I need to turn it into a pointer, I have to stare at the table in http://en.wikipedia.org/wiki/64_bit#64-bit_data_models for like 30 seconds. I'd rather have some Python API do the staring for me. David, I realize that table is probably permanently visible in the heads-up display that your cybernetic implants afford you, but some of us need to make our way through C code with humbler faculties ;-).
-g
This is also kind of a problem with PyPy and CFFI, where we actively discourage people from using C. Passing address as an int sounds like a very reasonable solution.
Cheers, fijal

On Sep 21, 2012, at 4:45 AM, Maciej Fijalkowski wrote:
This is also kind of a problem with PyPy and CFFI, where we actively discourage people from using C. Passing address as an int sounds like a very reasonable solution.
I just wanted to add that getting the address as an integer is useful because one might actually want to do math with it. Since memoryviews also expose the shape, itemsize, and other information, it's conceivable that one might combine these with the base address to compute locations within the array. Example: take a memoryview, slice up the buffer into partitions and hand them off to worker functions running in a thread pool.
Cheers, Dave

David Beazley dave@dabeaz.com wrote:
I have recently been experimenting with the memoryview() built-in and have come to believe that it really needs to expose the 'buf' attribute of the underlying Py_buffer structure as an integer (see PEP 3118). Let me explain.
That sounds quite harmless. People who use the pointer via ctypes etc. should know the implications. I've opened #15986 for this.
Stefan Krah
participants (8)
-
Benjamin Peterson
-
David Beazley
-
Glyph
-
Greg Ewing
-
Maciej Fijalkowski
-
Nick Coghlan
-
Richard Oudkerk
-
Stefan Krah