Free threading and borrowing references from mutable types
Considering the free threading issue (again), I found that functions returning borrowed references are problematic if the container is mutable. In traditional Python, extension modules could safely borrow references if they know that they maintain a reference to the container. If a thread switch is possible between getting the borrowed reference and using it, then this assumption is wrong: another thread may remove the reference from the container, so that the object dies. Therefore, I propose to deprecate these functions. I'm willing to write a PEP elaborating on that if necessary, but I'd like to perform a quick poll beforehand - whether people think that deprecating these functions is reasonable - whether it is sufficient to only have their abstract.c equivalents, or whether type-specific replacements that do return new references are needed - what else I'm missing. Specifically, I think the following functions are problematic: - PyList_GetItem, PyList_GET_ITEM, - PyDict_GetItem, PyDict_GetItemString Any comments appreciated, Martin
Considering the free threading issue (again), I found that functions returning borrowed references are problematic if the container is mutable.
In traditional Python, extension modules could safely borrow references if they know that they maintain a reference to the container. If a thread switch is possible between getting the borrowed reference and using it, then this assumption is wrong: another thread may remove the reference from the container, so that the object dies.
Good point. I hadn't though of this yet, but it's definitely yet another problem facing free threading.
Therefore, I propose to deprecate these functions. I'm willing to write a PEP elaborating on that if necessary, but I'd like to perform a quick poll beforehand - whether people think that deprecating these functions is reasonable - whether it is sufficient to only have their abstract.c equivalents, or whether type-specific replacements that do return new references are needed - what else I'm missing.
I'm personally not overly excited about free threading (Greg Stein agrees that it slows down the single-threaded case and expects that it will always remain optional). Therefore I'm at best lukewarm about this proposal. But at a recent PythonLabs meeting, a very different motivation was brought up to deprecate the type-specific APIs (all of them!): if someone subclasses e.g. dictionary and overrides __getitem__, code calling PyDict_GetItem on its instances can be considered wrong, because it circumvents the additional processing in __getitem__ (where e.g. case normalization or other forms of key mapping could affect the outcome). Because it returns a borrowed value, PyDict_GetItem can't safely be fixed to check for this and call the __getitem__ slot. Since there are many sensible uses of dictionary subclasses that don't override __getitem__, I find it would be a shame to change PyDict_Check() to only accept "real" dictionaries (not subclasses) -- this would disallow using dictionary subclasses for many interesting situations.
Specifically, I think the following functions are problematic: - PyList_GetItem, PyList_GET_ITEM, - PyDict_GetItem, PyDict_GetItemString
Any comments appreciated,
I believe that these APIs are still useful for more limited situations. E.g. if I write C code to implement some algorithm using a dictionary, if I create the dictionary myself, and don't pass it on to outside code, I can trust that it won't be mutated, so my use of PyDict_GetItem is safe. Another situation where PyDict_GetItem is unique: it doesn't raise an exception when the item is not present. This often saves a lot of overhead in situations where a missing item simply means to try something else, rather than a failure of the algorithm. I think that we may need an API with this property, even if it returns a new reference when successful. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
... Because it returns a borrowed value, PyDict_GetItem can't safely be fixed to check for this and call the __getitem__ slot.
If PyDict_GetItem ended up calling a non-genuine-dict __getitem__ slot, what would stop it from decref'ing the result in that case before returning it (thus returning a borrowed reference even so)? That __getitem__ may synthesize a result object with a refcount of 1? hard-to-approve-of-users<wink>-ly y'rs - tim
[Guido]
... Because it returns a borrowed value, PyDict_GetItem can't safely be fixed to check for this and call the __getitem__ slot.
[Tim]
If PyDict_GetItem ended up calling a non-genuine-dict __getitem__ slot, what would stop it from decref'ing the result in that case before returning it (thus returning a borrowed reference even so)? That __getitem__ may synthesize a result object with a refcount of 1?
Exactly. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (3)
-
Guido van Rossum
-
Martin von Loewis
-
Tim Peters