On Wed, 12 Jul 2017 at 01:25 Ronald Oussoren <ronaldoussoren@mac.com> wrote:
On 11 Jul 2017, at 12:19, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
This is the first draft of a big (?) project to prepare CPython to be able to "modernize" its implementation. Proposed changes should allow to make CPython more efficient in the future. The optimizations themself are out of the scope of the PEP, but some examples are listed to explain why these changes are needed.
I’m not sure if hiding implementation details will help a lot w.r.t. making CPython more efficient, but cleaning up the public API would avoid accidentally depending on non-public information (and is sound engineering anyway). That said, a lot of care should be taken to avoid breaking existing extensions as the ease of writing extensions is one of the strong points of CPython.
I also think the motivation doesn't have to be performance but simply cleaning up how we expose our C APIs to users as shown by the fact we have messed up the stable API by making it opt-out instead of opt-in.
Plan made of multiple small steps =================================
Step 1: split Include/ into subdirectories ------------------------------------------
Split the ``Include/`` directory of CPython:
* ``python`` API: ``Include/Python.h`` remains the default C API * ``core`` API: ``Include/core/Python.h`` is a new C API designed for building Python * ``stable`` API: ``Include/stable/Python.h`` is the stable ABI
Looks good in principle. It is currently too easy to accidentally add to the stable ABI by forgetting to add ‘#if’ guards around a non-stable API.
Expect declarations to be duplicated on purpose: ``#include`` should be not used to include files from a different API to prevent mistakes. In the past, too many functions were exposed *by mistake*, especially symbols exported to the stable ABI by mistake.
Not sure about this, shouldn’t it be possible to have ``python`` include ``core`` and ``core`` include ``stable``? This would avoid having to update multiple header files when adding new definitions.
Yeah, that's also what I initially thought. Use a cascading hierarchy so that people know they should put anything as high up as possible to minimize its exposure. [SNIP]
Step 3: first pass of implementation detail removal ---------------------------------------------------
Modify the ``python`` API:
* Add a new ``API`` subdirectory in the Python source code which will "implement" the Python C API * Replace macros with functions. The implementation of new functions will be written in the ``API/`` directory. For example, Py_INCREF() becomes the function ``void Py_INCREF(PyObject *op)`` and its implementation will be written in the ``API`` directory.
In this particular case (Py_INCREF/DECREF) making them functions isn’t really useful and is likely to be harmful for performance. It is not useful because these macros manipulate state in a struct that must be public because that struct is included into the structs for custom objects (PyObject_HEAD). Having them as macro’s also doesn’t preclude moving to indirect reference counts. Moving to anything that isn’t reference counts likely needs changes to the API (but not necessarily, see PyPy’s cpext).
I think Victor has long-term plans to try and hide the struct details at a higher-level and so that would make macros a bad thing. But ignoring the specific Py_INCREF/DECREF example, switching to functions does buy us the ability to actually change the function implementations between Python versions compared to having to worry about what a macro used to do (which is a possibility with the stable ABI).
* Slowly remove more and more implementation details from this API.
Modifications of these API should be driven by tests of popular third party packages like:
* Django with database drivers * numpy * scipy * Pillow * lxml * etc.
Compilation errors on these extensions are expected. This step should help to draw a line for the backward incompatible change.
This could also help to find places where the documented API is not sufficient. One of the places where I poke directly into implementation details is a C-level subclass of str (PyUnicode_Type). I’d prefer not doing that, but AFAIK there is no other way to be string-like to the C API other than by being a subclass of str.
Yeah, this would allow us to very clearly know what should or should not be documented (I would say the same for the stdlib but we all know old code didn't hide things with a leading underscore consistently).
BTW. The reason I need to subclass str: in PyObjC I use a subclass of str to represent Objective-C strings (NSString/NSMutableString), and I need to keep track of the original value; mostly because there are some Objective-C APIs that use object identity. The worst part is that fully initialising the PyUnicodeObject fields often isn’t necessary as a lot of Objective-C strings aren’t used as strings in Python code.
Enhancements becoming possible thanks to a new C API ====================================================
Indirect Reference Counting ---------------------------
* Replace ``Py_ssize_t ob_refcnt;`` (integer) with ``Py_ssize_t *ob_refcnt;`` (pointer to an integer). * Same change for GC headers? * Store all reference counters in a separated memory block (or maybe multiple memory blocks)
This could be done right now with a minimal change to the API: just make the ob_refcnt and ob_type fields of the PyObject struct private by renaming them, in Py3 the documented way to access theses fields is through function macros and these could by changed to do indirect refcounting instead.
I think this is why Victor wants functions, because even if you change the names the macros will be locked into their implementations if you try to write code that supports multiple versions and so you can't change it per-version of Python. -Brett