On 11 Jul 2017, at 12:19, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
This is the first draft of a big (?) project to prepare CPython to be able to "modernize" its implementation. Proposed changes should allow to make CPython more efficient in the future. The optimizations themself are out of the scope of the PEP, but some examples are listed to explain why these changes are needed.
I’m not sure if hiding implementation details will help a lot w.r.t. making CPython more efficient, but cleaning up the public API would avoid accidentally depending on non-public information (and is sound engineering anyway). That said, a lot of care should be taken to avoid breaking existing extensions as the ease of writing extensions is one of the strong points of CPython.
Plan made of multiple small steps =================================
Step 1: split Include/ into subdirectories ------------------------------------------
Split the ``Include/`` directory of CPython:
* ``python`` API: ``Include/Python.h`` remains the default C API * ``core`` API: ``Include/core/Python.h`` is a new C API designed for building Python * ``stable`` API: ``Include/stable/Python.h`` is the stable ABI
Looks good in principle. It is currently too easy to accidentally add to the stable ABI by forgetting to add ‘#if’ guards around a non-stable API.
Expect declarations to be duplicated on purpose: ``#include`` should be not used to include files from a different API to prevent mistakes. In the past, too many functions were exposed *by mistake*, especially symbols exported to the stable ABI by mistake.
Not sure about this, shouldn’t it be possible to have ``python`` include ``core`` and ``core`` include ``stable``? This would avoid having to update multiple header files when adding new definitions.
At this point, ``Include/Python.h`` is not changed at all: zero risk of backward incompatibility.
The ``core`` API is the most complete API exposing *all* implementation details and use macros for best performances.
XXX should we abandon the stable ABI? Never really used by anyone.
Assuming that’s true, has anyone looked into why it is barely used? If I’d have to guess its due to inertia.
Step 3: first pass of implementation detail removal ---------------------------------------------------
Modify the ``python`` API:
* Add a new ``API`` subdirectory in the Python source code which will "implement" the Python C API * Replace macros with functions. The implementation of new functions will be written in the ``API/`` directory. For example, Py_INCREF() becomes the function ``void Py_INCREF(PyObject *op)`` and its implementation will be written in the ``API`` directory.
In this particular case (Py_INCREF/DECREF) making them functions isn’t really useful and is likely to be harmful for performance. It is not useful because these macros manipulate state in a struct that must be public because that struct is included into the structs for custom objects (PyObject_HEAD). Having them as macro’s also doesn’t preclude moving to indirect reference counts. Moving to anything that isn’t reference counts likely needs changes to the API (but not necessarily, see PyPy’s cpext).
* Slowly remove more and more implementation details from this API.
Modifications of these API should be driven by tests of popular third party packages like:
* Django with database drivers * numpy * scipy * Pillow * lxml * etc.
Compilation errors on these extensions are expected. This step should help to draw a line for the backward incompatible change.
This could also help to find places where the documented API is not sufficient. One of the places where I poke directly into implementation details is a C-level subclass of str (PyUnicode_Type). I’d prefer not doing that, but AFAIK there is no other way to be string-like to the C API other than by being a subclass of str. BTW. The reason I need to subclass str: in PyObjC I use a subclass of str to represent Objective-C strings (NSString/NSMutableString), and I need to keep track of the original value; mostly because there are some Objective-C APIs that use object identity. The worst part is that fully initialising the PyUnicodeObject fields often isn’t necessary as a lot of Objective-C strings aren’t used as strings in Python code.
Enhancements becoming possible thanks to a new C API ====================================================
Indirect Reference Counting ---------------------------
* Replace ``Py_ssize_t ob_refcnt;`` (integer) with ``Py_ssize_t *ob_refcnt;`` (pointer to an integer). * Same change for GC headers? * Store all reference counters in a separated memory block (or maybe multiple memory blocks)
This could be done right now with a minimal change to the API: just make the ob_refcnt and ob_type fields of the PyObject struct private by renaming them, in Py3 the documented way to access theses fields is through function macros and these could by changed to do indirect refcounting instead.
Tagged pointers ---------------
https://en.wikipedia.org/wiki/Tagged_pointer
Common optimization, especially used for "small integers".
Current C API doesn't allow to implement tagged pointers.
Why not? Thanks to Py_TYPE and Py_INCREF/Py_DECREF it should be possible to use tagged pointers without major changes to the API (also: see above).
Tagged pointers are used in MicroPython to reduce the memory footprint.
Note: ARM64 was recently extended its address space to 48 bits, causing issue in LuaJIT: `47 bit address space restriction on ARM64 <https://github.com/LuaJIT/LuaJIT/issues/49>`_.
That shouldn’t be a problem when only using the least significant bits as tag bits (those bits that are known to be zero in untagged pointers due to alignment).
Idea: Multiple Python binaries ==============================
Instead of a single ``python3.7``, providing two or more binaries, as PyPy does, would allow to experiment more easily changes without breaking the backward compatibility.
For example, ``python3.7`` would remain the default binary with reference counting and the current garbage collector, whereas ``fastpython3.7`` would not use reference counting and a new garbage collector.
It would allow to more quickly "break the backward compatibility" and make it even more explicit than only prepared C extensions will be compatible with the new ``fastpython3.7``.
The cost is having to maintain both indefinitely. Ronald