Mailman 3 Add Py_SETREF and Py_XSETREF to the stable C API - Python-Dev

Add Py_SETREF and Py_XSETREF to the stable C API

Serhiy Storchaka

8 Nov 2017 8 Nov '17

4:30 p.m.

Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to all maintained versions ([1] and [2]). Despite their names they are private. I think that they are enough stable now and would be helpful in third-party code. Are there any objections against adding them to the stable C API? [3] [1] https://bugs.python.org/issue20440 [2] https://bugs.python.org/issue26200 [3] https://bugs.python.org/issue31983

Show replies by date

Victor Stinner

8 Nov 8 Nov

4:37 p.m.

I like these macros! Technically, would it be possible to use an inline function instead of a macro for Python 3.7? Victor 2017-11-08 17:30 GMT+01:00 Serhiy Storchaka <storchaka@gmail.com>:

...

Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to all maintained versions ([1] and [2]). Despite their names they are private. I think that they are enough stable now and would be helpful in third-party code. Are there any objections against adding them to the stable C API? [3]

[1] https://bugs.python.org/issue20440 [2] https://bugs.python.org/issue26200 [3] https://bugs.python.org/issue31983

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.co...

Serhiy Storchaka

4:47 p.m.

08.11.17 18:37, Victor Stinner пише:

...

I like these macros!

Technically, would it be possible to use an inline function instead of a macro for Python 3.7?

No, unless there is a way to pass arguments by reference in C99. Py_SETREF(x, y) is the safe equivalent of x = y; Py_DECREF(x, y);

Raymond Hettinger

9 Nov 9 Nov

2:08 a.m.

...

On Nov 8, 2017, at 8:30 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:

Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to all maintained versions ([1] and [2]). Despite their names they are private. I think that they are enough stable now and would be helpful in third-party code. Are there any objections against adding them to the stable C API? [3]

I have mixed feeling about this. You and Victor seem to really like these macros, but they have been problematic for me. I'm not sure whether it is a conceptual issue or a naming issue, but the presence of these macros impairs my ability to read code and determine whether the refcounts are correct. I usually end-up replacing the code with the unrolled macro so that I can count the refs across all the code paths. The other issue is that when there are multiple occurrences of these macros for multiple variables, it interferes with my best practice of deferring all decrefs until the data structures are in a fully consistent state. Any one of these can cause arbitrary code to run. I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code. Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered. That practice helps me avoid hard to spot re-entrancy issues. Lastly, I think we should have a preference to not grow the stable C API. Bigger APIs are harder to learn and remember, not so much for you and Victor who use these frequently, but for everyone else who has to lookup all the macros whose function isn't immediately self-evident. Raymond

Serhiy Storchaka

10:44 a.m.

09.11.17 04:08, Raymond Hettinger пише:

...

...
On Nov 8, 2017, at 8:30 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:

Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to all maintained versions ([1] and [2]). Despite their names they are private. I think that they are enough stable now and would be helpful in third-party code. Are there any objections against adding them to the stable C API? [3]

I have mixed feeling about this. You and Victor seem to really like these macros, but they have been problematic for me. I'm not sure whether it is a conceptual issue or a naming issue, but the presence of these macros impairs my ability to read code and determine whether the refcounts are correct. I usually end-up replacing the code with the unrolled macro so that I can count the refs across all the code paths.

If the problem is with naming, what names do you prefer? This already was bikeshedded (I insisted on discussing names before introducing the macros), but may now you have better ideas? The current code contains 212 usages of Py_SETREF and Py_XSETREF. Maybe 10% of them correctly used temporary variables before introducing these macros, and these macros just made the code shorter. But in the rest of cases the refcount was decremented before setting new value. Not always this caused a problem, but it is too hard to prove that using such code is safe in every concrete case. Unrolling all these invocations will make the code larger and more cumbersome, and it is hard to do automatically.

...

The other issue is that when there are multiple occurrences of these macros for multiple variables, it interferes with my best practice of deferring all decrefs until the data structures are in a fully consistent state. Any one of these can cause arbitrary code to run. I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code. Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered. That practice helps me avoid hard to spot re-entrancy issues.

I agree with you. If you need to set two or more attributes synchronously, Py_SETREF will not help you. This should be clearly explained in the documentation. Several subsequent Py_SETREFs may be an error. When I created my patches for using Py_SETREF I encountered several such cases and used different code for them. Maybe still there is not completely correct code, but in any case it is better now than before introducing Py_SETREF. But in many case you need to set only one attribute or different attributes are not tightly related.

...

Lastly, I think we should have a preference to not grow the stable C API. Bigger APIs are harder to learn and remember, not so much for you and Victor who use these frequently, but for everyone else who has to lookup all the macros whose function isn't immediately self-evident.

I agree with you again. But these macros are pretty helpful. They allow to write the safer code easy. And they are more used than any other C API addition in 3.5, 3.6, and 3.7. 3.7: Py_X?SETREF -- 216 Py_UNREACHABLE -- 65 Py_RETURN_RICHCOMPARE -- 15 PyImport_GetModule -- 28 PyTraceMalloc_(T|Unt)rack -- 9 PyOS_(BeforeFork|AfterFork_(Parent|Child)) -- 24 Py_tss_NEEDS_INIT -- 6 PyInterpreterState_GetID -- 3 3.6: PySlice_Unpack -- 22 PySlice_AdjustIndices -- 22 PyErr_SetImportErrorSubclass -- 3 PyErr_ResourceWarning -- 6 PyOS_FSPath -- 9 Py_FinalizeEx -- 21 Py_MEMBER_SIZE -- 4 3.5: PyCodec_NameReplaceErrors -- 3 PyErr_FormatV -- 6 PyCoro_New -- 3 PyCoro_CheckExact -- 14 PyModuleDef_Init -- 30 PyModule_FromDefAndSpec2? -- 10 PyModule_ExecDef -- 3 PyModule_SetDocString -- 4 PyModule_AddFunctions -- 3 PyNumber_MatrixMultiply -- 4 PyNumber_InPlaceMatrixMultiply -- 4 Py_DecodeLocale -- 25 Py_EncodeLocale -- 11 Some older macros: Py_STRINGIFY -- 15 Py_ABS -- 54 Py_MIN -- 66 Py_MAX - 56 The above number include declaration and definition, hence remove 2 or 3 per name for getting the number of usages. Only added in 3.4 Py_UNUSED beats them (291) because it is used in generated Argument Clinic code.

Raymond Hettinger

12:22 p.m.

...

On Nov 9, 2017, at 2:44 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:

If the problem is with naming, what names do you prefer? This already was bikeshedded (I insisted on discussing names before introducing the macros), but may now you have better ideas?

It didn't really seem like a bad idea until after you swept through the code with 200+ applications of the macro and I saw how unclear the results were. Even code that I wrote myself is now harder for me to grok (for example, the macro was applied 17 times to already correct code in itertools). We used to employ a somewhat plain coding style that was easy to walk through, but the following examples seem opaque. I find it takes practice to look at any one of these and say that it is unequivocally correct (were the function error return arguments handled correctly, are the typecasts proper, at what point can a reentrant call occur, which is the source operand and which is the destination, is the macro using either of the operands twice, is the destination operand an allowable lvalue, do I need to decref the source operand afterwards, etc): Py_SETREF(((PyHeapTypeObject*)type)->ht_name, value) Py_SETREF(newconst, PyFrozenSet_New(newconst)); Py_XSETREF(c->u->u_private, s->v.ClassDef.name); Py_SETREF(*p, t); Py_XSETREF(self->lineno, PyTuple_GET_ITEM(info, 1)); Py_SETREF(entry->path, PyUnicode_EncodeFSDefault(entry->path)); Py_XSETREF(self->checker, PyObject_GetAttrString(ob, "_check_retval_")); Py_XSETREF(fut->fut_source_tb, _PyObject_CallNoArg(traceback_extract_stack)); Stylistically, all of these seem awkward and I think there is more to it than just the name. I'm not sure it is wise to pass complex inputs into a two-argument macro that makes an assignment and has a conditional refcount side-effect. Even now, one of the above looks to me like it might not be correct. Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples. As a result of making the macros official, will the Python world have a net increase in complexity or decrease in complexity? My personal experience with the macros hasn't been positive. Perhaps everyone else thinks it's fine. If so, I won't stand in your way. Raymond

Antoine Pitrou

12:35 p.m.

On Thu, 9 Nov 2017 04:22:20 -0800 Raymond Hettinger <raymond.hettinger@gmail.com> wrote:

...

Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples.

Generally I would advocate anyone wanting to write a third-party C extension, but not very familiar with the C API and its quirks, use Cython instead. I'm not sure if that's an argument for the SETREF APIs to remain private or to become public :-) Regards Antoine.

Nick Coghlan

10 Nov 10 Nov

2:09 a.m.

On 9 November 2017 at 22:35, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Thu, 9 Nov 2017 04:22:20 -0800 Raymond Hettinger <raymond.hettinger@gmail.com> wrote:

...
Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples.

Generally I would advocate anyone wanting to write a third-party C extension, but not very familiar with the C API and its quirks, use Cython instead. I'm not sure if that's an argument for the SETREF APIs to remain private or to become public :-)

I'm with Antoine on this - we should be pushing folks writing extension modules towards code generators like Cython, cffi, SWIG, and SIP, support libraries like Boost::Python, or safer languages like Rust (which can then be wrapped with cffi), rather than encouraging more bespoke C/C++ extensions modules with handcrafted refcount management. There's a reason the only parts of https://packaging.python.org/guides/packaging-binary-extensions/ that have actually been filled in are the ones explaining how to use a tool to write the extension module for you :) For me, that translates to being -1 on making these part of the public API - for code outside CPython, our message should consistently be "Do not roll your own refcount management, get a code generator or library to handle it for you". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Stefan Krah

11:25 a.m.

On Fri, Nov 10, 2017 at 12:09:12PM +1000, Nick Coghlan wrote:

...

I'm with Antoine on this - we should be pushing folks writing extension modules towards code generators like Cython, cffi, SWIG, and SIP, support libraries like Boost::Python, or safer languages like Rust (which can then be wrapped with cffi), rather than encouraging more bespoke C/C++ extensions modules with handcrafted refcount management. There's a reason the only parts of https://packaging.python.org/guides/packaging-binary-extensions/ that have actually been filled in are the ones explaining how to use a tool to write the extension module for you :)

They will be slower and in my experience not easier to maintain -- quite the opposite. Stefan Krah

Victor Stinner

9 Nov 9 Nov

12:35 p.m.

Hum, to give more context to the discussion, the two discussed macros are documented this way: #ifndef Py_LIMITED_API /* Safely decref `op` and set `op` to `op2`. * * As in case of Py_CLEAR "the obvious" code can be deadly: * * Py_DECREF(op); * op = op2; * * The safe way is: * * Py_SETREF(op, op2); * * That arranges to set `op` to `op2` _before_ decref'ing, so that any code * triggered as a side-effect of `op` getting torn down no longer believes * `op` points to a valid object. * * Py_XSETREF is a variant of Py_SETREF that uses Py_XDECREF instead of * Py_DECREF. */ #define Py_SETREF(op, op2) \ do { \ PyObject *_py_tmp = (PyObject *)(op); \ (op) = (op2); \ Py_DECREF(_py_tmp); \ } while (0) #define Py_XSETREF(op, op2) \ do { \ PyObject *_py_tmp = (PyObject *)(op); \ (op) = (op2); \ Py_XDECREF(_py_tmp); \ } while (0) #endif /* ifndef Py_LIMITED_API */ Victor 2017-11-09 13:22 GMT+01:00 Raymond Hettinger <raymond.hettinger@gmail.com>:

...

...
On Nov 9, 2017, at 2:44 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:

If the problem is with naming, what names do you prefer? This already was bikeshedded (I insisted on discussing names before introducing the macros), but may now you have better ideas?

It didn't really seem like a bad idea until after you swept through the code with 200+ applications of the macro and I saw how unclear the results were. Even code that I wrote myself is now harder for me to grok (for example, the macro was applied 17 times to already correct code in itertools).

We used to employ a somewhat plain coding style that was easy to walk through, but the following examples seem opaque. I find it takes practice to look at any one of these and say that it is unequivocally correct (were the function error return arguments handled correctly, are the typecasts proper, at what point can a reentrant call occur, which is the source operand and which is the destination, is the macro using either of the operands twice, is the destination operand an allowable lvalue, do I need to decref the source operand afterwards, etc):

Py_SETREF(((PyHeapTypeObject*)type)->ht_name, value) Py_SETREF(newconst, PyFrozenSet_New(newconst)); Py_XSETREF(c->u->u_private, s->v.ClassDef.name); Py_SETREF(*p, t); Py_XSETREF(self->lineno, PyTuple_GET_ITEM(info, 1)); Py_SETREF(entry->path, PyUnicode_EncodeFSDefault(entry->path)); Py_XSETREF(self->checker, PyObject_GetAttrString(ob, "_check_retval_")); Py_XSETREF(fut->fut_source_tb, _PyObject_CallNoArg(traceback_extract_stack));

Stylistically, all of these seem awkward and I think there is more to it than just the name. I'm not sure it is wise to pass complex inputs into a two-argument macro that makes an assignment and has a conditional refcount side-effect. Even now, one of the above looks to me like it might not be correct.

Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples. As a result of making the macros official, will the Python world have a net increase in complexity or decrease in complexity?

My personal experience with the macros hasn't been positive. Perhaps everyone else thinks it's fine. If so, I won't stand in your way.

Raymond

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.co...

Serhiy Storchaka

1:24 p.m.

09.11.17 14:22, Raymond Hettinger пише:

...

Stylistically, all of these seem awkward and I think there is more to it than just the name. I'm not sure it is wise to pass complex inputs into a two-argument macro that makes an assignment and has a conditional refcount side-effect. Even now, one of the above looks to me like it might not be correct.

If you have found an incorrect code, please open an issue and provide a patch. But recently you have rewrote the correct code (Py_SETREF was not involved) in more complicated way [1] and have rejected my patch that gets rid of the duplication of this complicated code [2]. Please don't "fix" the code that is not broken. [1] https://bugs.python.org/issue26491 [2] https://bugs.python.org/issue31585

...

Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples. As a result of making the macros official, will the Python world have a net increase in complexity or decrease in complexity?

I afraid that these macros will be used in any case, even when they are not the part of an official C API, because they are handy. The main purpose of documenting them officially is documenting in what cases these macros are appropriate and make the code more reliable, and in what cases they are not enough and a more complex code should be used. This would be a lesson about correct replacing references. I didn't write this in the source comment because it was purposed for experienced Python core developers, and all usages under our control and passes a peer review.

Victor Stinner

11:06 a.m.

Recently, Oren Milman fixed multiple bugs when an __init__() method was called twice. IMHO Py_SETREF() was nicely used in __init__(): https://github.com/python/cpython/commit/e56ab746a965277ffcc4396d8a0902b6e07... https://github.com/python/cpython/commit/c0cabc23bbe474d542ff8a4f1243f4ec3cc... While it's possible to rewrite the code *correctly* without PY_SETREF(), it would be much more verbose. Here the fix remains a single line: - self->archive = filename; + Py_XSETREF(self->archive, filename); Victor 2017-11-09 3:08 GMT+01:00 Raymond Hettinger <raymond.hettinger@gmail.com>:

...

...
On Nov 8, 2017, at 8:30 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:

Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to all maintained versions ([1] and [2]). Despite their names they are private. I think that they are enough stable now and would be helpful in third-party code. Are there any objections against adding them to the stable C API? [3]

I have mixed feeling about this. You and Victor seem to really like these macros, but they have been problematic for me. I'm not sure whether it is a conceptual issue or a naming issue, but the presence of these macros impairs my ability to read code and determine whether the refcounts are correct. I usually end-up replacing the code with the unrolled macro so that I can count the refs across all the code paths.

The other issue is that when there are multiple occurrences of these macros for multiple variables, it interferes with my best practice of deferring all decrefs until the data structures are in a fully consistent state. Any one of these can cause arbitrary code to run. I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code. Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered. That practice helps me avoid hard to spot re-entrancy issues.

Lastly, I think we should have a preference to not grow the stable C API. Bigger APIs are harder to learn and remember, not so much for you and Victor who use these frequently, but for everyone else who has to lookup all the macros whose function isn't immediately self-evident.

Raymond

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.co...

Victor Stinner

11:26 a.m.

2017-11-09 3:08 GMT+01:00 Raymond Hettinger <raymond.hettinger@gmail.com>:

...

I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code.

There are 3 patterns to update C attributes of an object: (1) Py_XDECREF(obj->attr); // can call Python code obj->attr = new_value; or (2) old_value = obj->attr; obj->attr = new_value; Py_XDECREF(old_value); // can call Python code or (3) old_value = obj->attr; obj->attr = new_value; ... // The assumption here is that nothing here ... // can call arbitrary Python code // Finally, after setting all other attributes Py_XDECREF(old_value); // can call Python code Pattern (1) is likely to be vulnerable to reentrancy issue: Py_XDECREF() can call arbitrary Python code indirectly by the garbage collector, while the object being modified contains a *borrowed* reference instead of a *strong* reference, or can even refer an object which was just destroyed. Pattern (2) is better: the object always keeps a strong reference, *but* the modified attribute can be inconsistent with other attributes. At least, you prevent hard crashes. Pattern (3) is likely the most correct way to write C code to implement a Python object... but it's harder to write such code correctly :-( You have to be careful to not leak a reference. If I understood correctly, the purpose of the Py_SETREF() macro is not to replace (3) with (2), but to fix all incorrect code written as (1). If I recall correctly, Serhiy modified a *lot* of code written as (1) when he implemented Py_SETREF().

...

Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered. That practice helps me avoid hard to spot re-entrancy issues.

Except if you use a lock, all Python methods are written as (2): a different thread or a signal handler is likely to see the object as inconsistent, when accessed between two instructions modifying an object attributes. Example: def __init__(self, value): self.value = value self.double = value * 2 def increment(self): self.value += 1 # object inconsistent here self.double *= 2 The increment() method is not atomic: if the object is accessed at "# object inconsistent here", the object is seen in an inconsistent state. Victor

2623

Age (days ago)

2625

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

Antoine Pitrou
Nick Coghlan
Raymond Hettinger
Serhiy Storchaka
Stefan Krah
Victor Stinner

Add Py_SETREF and Py_XSETREF to the stable C API

tags

participants (6)