Re: [Libguestfs] [PATCH 1/2] python: Avoid crash if callback parameters cannot be built
From: Eric Blake email@example.com To: Laszlo Ersek firstname.lastname@example.org, email@example.com Cc: Nir Soffer firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com Bcc: Subject: Re: [Libguestfs] [PATCH 1/2] python: Avoid crash if callback parameters cannot be built Reply-To: In-Reply-To: firstname.lastname@example.org
[adding email@example.com, in the hopes that this conversation can spur an improvement in Python's documentation]
I'm trimming to just context on the doc issue at hand (this question spawned deep within a thread  to libguestfs reference counting bug in its Python bindings)
On Tue, Feb 21, 2023 at 03:06:47PM +0100, Laszlo Ersek wrote:
On 2/20/23 21:52, Eric Blake wrote:
On Mon, Feb 20, 2023 at 09:08:08PM +0200, Nir Soffer wrote:
On Mon, Feb 20, 2023 at 10:45 AM Laszlo Ersek firstname.lastname@example.org wrote:
On 2/17/23 17:52, Eric Blake wrote:
On Thu, Feb 16, 2023 at 03:09:02PM +0100, Laszlo Ersek wrote:
- Py_BuildValue with the "O" format specifier transfers the new list's
*sole* reference (= ownership) to the just-built higher-level object "args"
Reference transfer is done with "N", not "O". That would be an alternative to decreasing the refcount of py_array on success, but not eliminate the need to decrease the refcount on Py_BuildValue failure.
- when "args" is killed (decref'd), it takes care of "py_array".
Consequently, if Py_BuildValue fails, "py_array" continues owning the new list -- and I believe that, if we take the new error branch, we leak the object pointed-to by "py_array". Is that the case?
Not quite. "O" is different than "N".
I agree with you *now*, looking up the "O" specification at https://docs.python.org/3/c-api/arg.html#building-values.
However, when I was writing my email, I looked up Py_BuildValue at that time as well, just elsewhere. I don't know where. Really. And then that documentation said that the reference count would *not* be increased. I distinctly remember that, because it surprised me -- I actually recalled an *even earlier* experience reading the documentation, which had again stated that "O" would increase the reference count.
Maybe here: https://docs.python.org/2/c-api/arg.html#building-values
Looks like another incompatibility between python 2 and 3.
Or maybe misreading the wrong part of the page. PyArg_ParseTuple() and Py_BuildValue() are listed on the same page, and intentionally use similar-looking format strings, so you have to check _which_ part of the page the "O" you are reading about is associated with. The first "O" is under PyArg_ParseTuple() and friends, and intentionally does not increase reference count (the usesr passed us an Object, we are parsing it into a placeholder where we don't need to clean up after ourselves unless we want to add a reference to the object to make it last beyond the caller), the latter says that "O" does increase the reference count (we are building up a larger object that will now be an additional reference path into the object we are passing in).
Yea, I'll just go ahead and call myself an idiot. :)
(Please, please, make documentation fool-proof! I'm not an unrelenting, unrepentant, principled fool, but still a fool.)
In short, the issue is that the Python documentation (even in the latest version at https://docs.python.org/3/c-api/arg.html) has TWO separate paradigms on the SAME page, and so many format characters that you can easily get lost in a wall of
X (foo) [bar]
designators that you can easily lose track of whether you are reading the docs for PyArg_ParseTuple() and friends with paradigm "X (Python source) [C destination]", or the docs for Py_BuildValue() with paradigm "X (Python destination) [C source]".
When it comes to reference counting, there is a HUGE and IMPORTANT difference between the former:
O (object) [PyObject *]
Store a Python object (without any conversion) in a C object pointer. The C program thus receives the actual object that was passed. The object’s reference count is not increased. The pointer stored is not NULL.
and the latter:
O (object) [PyObject *]
Pass a Python object untouched (except for its reference count, which is incremented by one). If the object passed in is a NULL pointer, it is assumed that this was caused because the call producing the argument found an error and set an exception. Therefore, Py_BuildValue() will return NULL but won’t raise an exception. If no exception has been raised yet, SystemError is set.
But as neither instance of "O (object) [PyObject *]" is on the screen at the same time as the functions to which that format specifier is documenting, it is EASY for a novice coder to read the wrong one, and then get the wrong ideas about reference counting when using the "O" format specifier.
My suggestion: a simple swap of the order of the Py_BuildValue() notations to:
PyArg_ParseValue()... X (Python source) [C destination]
Py_BuildValue()... X [C source] (Python destination]
so that it is always source before destination, and always  for the C counterpart and () for the Python counterpart. Another idea might be splitting the page into two, so that the two walls of format specifier text cannot be mistaken for one another when rapidly scrolling through the page looking for a given format letter; but I personally like both on the same page as the two sets of format specifiers are (intentionally) related, even if subtly different depending on direction.