[Python-Dev] Add Py_SETREF and Py_XSETREF to the stable C API

Victor Stinner victor.stinner at gmail.com
Thu Nov 9 06:26:28 EST 2017


2017-11-09 3:08 GMT+01:00 Raymond Hettinger <raymond.hettinger at gmail.com>:
> I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code.

There are 3 patterns to update C attributes of an object:

(1)
Py_XDECREF(obj->attr); // can call Python code
obj->attr = new_value;

or

(2)
old_value = obj->attr;
obj->attr = new_value;
Py_XDECREF(old_value); // can call Python code

or

(3)
old_value = obj->attr;
obj->attr = new_value;
... // The assumption here is that nothing here
... // can call arbitrary Python code
// Finally, after setting all other attributes
Py_XDECREF(old_value); // can call Python code


Pattern (1) is likely to be vulnerable to reentrancy issue:
Py_XDECREF() can call arbitrary Python code indirectly by the garbage
collector, while the object being modified contains a *borrowed*
reference instead of a *strong* reference, or can even refer an object
which was just destroyed.

Pattern (2) is better: the object always keeps a strong reference,
*but* the modified attribute can be inconsistent with other
attributes. At least, you prevent hard crashes.

Pattern (3) is likely the most correct way to write C code to
implement a Python object... but it's harder to write such code
correctly :-( You have to be careful to not leak a reference.


If I understood correctly, the purpose of the Py_SETREF() macro is not
to replace (3) with (2), but to fix all incorrect code written as (1).
If I recall correctly, Serhiy modified a *lot* of code written as (1)
when he implemented Py_SETREF().


> Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered.  That practice helps me avoid hard to spot re-entrancy issues.

Except if you use a lock, all Python methods are written as (2): a
different thread or a signal handler is likely to see the object as
inconsistent, when accessed between two instructions modifying an
object attributes.

Example:

def __init__(self, value):
    self.value = value
    self.double = value * 2

def increment(self):
    self.value += 1
    # object inconsistent here
    self.double *= 2

The increment() method is not atomic: if the object is accessed at "#
object inconsistent here", the object is seen in an inconsistent
state.

Victor


More information about the Python-Dev mailing list