On 5 Sep 2018, at 15:17, Victor Stinner vstinner@redhat.com wrote:
Le mer. 5 sept. 2018 à 13:36, Ronald Oussoren ronaldoussoren@mac.com a écrit :
Borrowed references have there problems, but do make the CPython API easier to use. I’m all for making the API safer to use, but IMHO eradicating the use of borrowed references is not necessary for that.
I don't really care about designing an API easy to use.
I do because an easy to use C API is one of the reasons why Python is as popular as it is.
One thing that might help with correct C API usage is to teach static analysis tools about the API and correct usage thereof, see <https://emptysqua.re/blog/analyzing-python-c-extensions-with-cpychecker/ https://emptysqua.re/blog/analyzing-python-c-extensions-with-cpychecker/> for an experimenal example of this from the past. Doing this might also help in finding APIs that are hard to reason about, and finding alternatives for them. Apple added something similar to clang before they switched to automatic reference counting (ARC) in ObjC.
Right now, I prefer to focus on correctness and design the ideal API. Once we know what is the ideal API and see the cost to reach it, we can start talking about tradeoff and milestones to reach the final goal.
There are APIs where moving to a variant without borrowed references would be better (in particular PyList_GetItem, which is a ticking bomb if you don’t INCREF the result before calling a CPython API)
I removed PyTuple_GetItem() from Py_NEWCAPI, replaced with PyTuple_GetItemRef(). I will do the same for PyList_GetItem() and other variants.
PyTuple_GetItem should be fairly safe, if the callers owns a reference to the tuple itself. It is technically possible that tuples change, but that’s bad behaviour (similar to how C code can change the contents of strings, which would break assumptions in other code).
In analysing this it is useful to be aware of where you are borrowing the reference from, and that promise is invalidated.
The most problematic part of 2) is that Py_REFCNT is an lvalue and some code might exploit this to change the refcount of an object, but that’s arguably inherently bad code.
I introduced _Py_SET_TYPE() and _Py_SET_REFCNT() for that. Maybe I should make them public, I don't know yet.
IMHO these shouldn’t be public. Setting the type can be done the same way as from python; Set the __class__ attribute of the instance, and setting the reference count is at best questionable. Both won’t work with tagged pointers.
Both require changes to the stable ABI though.
The stable ABI is not usable yet, so I don't think that anyone is using it right now. My plan is to make it much easier to use it, and later maybe make it the default.
There are some users, a fairly high-profile one is PyQt5 which has wheels the big tree platforms on PyPI that use the stable ABI.
BTW. Both the refcount macros and Py_TYPE are called a lot, unconditionally turning them into external functions could be quite costly performance wise.
I would prefer to not guess anything about performance. First experiment the change, run benchmark, then decide about the tradeoff between performance and correctness.
I'm working on a fork of CPython. I will not touch CPython upstream until we have enough information to get wise decisions ;-)
:-).
P.S. I’m sorry that I sound fairly negative, that is not my intention. I’m all for improving the API, but it should be done for the right reasons and not just because changes *might* help with future developments and/or other implementations.
Ronald