[Python-Dev] ctypes: is it intentional that id() is the only way to get the address of an object?

Sat Jan 19 06:06:36 EST 2019

On 1/18/19, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, Jan 17, 2019 at 07:50:51AM -0600, eryk sun wrote:
>>
>> It's kind of dangerous to pass an object to C without an increment of
>> its reference count.
>
> "Kind of dangerous?" How dangerous?

I take that back. Dangerous is too strong of a word. It can be managed
if we're careful to avoid expressions like c_function(id(f())). Using
py_object simply avoids that problem.

Bear with me while I make a few more comments about py_object, even
though it's straying off topic.

For a type "O" argument (i.e. py_object is in the function's
`argtypes`), we might be able to borrow the reference from the
argument tuple. As implemented, however, the argument actually keeps
its own reference. For example, we can observe this by calling the
from_param method:

    >>> b = bytearray(b'spam')
    >>> arg = ctypes.py_object.from_param(b)
    >>> print(arg)
    <cparam 'O' at 0x7f32a49699b0>
    >>> print(arg._obj)
    bytearray(b'spam')

This is due to the type "O" setfunc, which needs to keep a reference
to the object when setting the value of a py_object instance. The
reference is stored as the _objects attribute. (For non-simple pointer
and aggregate types, _objects is instead a dict keyed by the index as
a hexadecimal string.)

(The getfunc and setfunc of a simple ctypes object are called to get
and set the value, which also includes cases in which we don't have an
actual py_object instance, such as function call arguments; pointer
and array indexes; and struct and union fields. These functions are
defined in Modules/_ctypes/cfield.c.)

IMO, a downside of py_object is that it's a simple type, so the
getfunc gets called automatically when getting fields or indexes. This
is annoying for py_object since a NULL value raises ValueError.
Returning None in this case isn't possible, in contrast to other
simple pointer types. We can work around this by subclassing
py_object. For example:

    >>> a1 = (ctypes.py_object * 1)()
    >>> a1[0]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: PyObject is NULL

    py_object = type('py_object', (ctypes.py_object,), {})

    >>> a2 = (py_object * 1)()
    >>> a2[0]
    <py_object object at 0x7f10dc7d9158>

Then, like all ctypes pointers, a false boolean value means it's NULL:

    >>> bool(a2[0])
    False
    >>> a2[0] = b'spam'
    >>> bool(a2[0])
    True

py_object doesn't help if a library holds onto the pointer and tries
to use it later on. For example, with Python's C API there are
functions that 'steal' a reference (with the assumption that it's a
newly created object, in which case it's more like 'claiming'), such
as PyTuple_SetItem. In this case, we need to increment the reference
count via Py_IncRef.

py_object can be returned from a callback without leaking a reference,
assuming the library manages the new reference. In contrast, other
types that need memory support have to leak a reference (e.g.
c_wchar_p, i.e. type "Z", needs a capsule object for the wchar_t
buffer). In case of a leak, we get warned with RuntimeWarning('memory
leak in callback function.').

> If I am reading this correctly, I think you are saying that using id()
> in this way is never(?) correct.

Yes, it's incorrect, but I've been guilty of using id() like this,
too, because it's convenient. Perhaps we could provide a function
that's explicitly specified to return the address, if implemented.
Maybe call it sys.getaddress()?

In my first reply, I provided two alternatives that use ctypes to
return the address instead of id(). So there's that as well. The fine
print is that ctypes is optional in the standard library. Platforms
and implementations don't have to support it.