[Python-ideas] PEP: Hide implementation details in the C API

Ronald Oussoren ronaldoussoren at mac.com
Thu Jul 13 12:11:26 EDT 2017


> On 12 Jul 2017, at 20:51, Brett Cannon <brett at python.org> wrote:
> 
> 
> 
> On Wed, 12 Jul 2017 at 01:25 Ronald Oussoren <ronaldoussoren at mac.com <mailto:ronaldoussoren at mac.com>> wrote:
> 
> > On 11 Jul 2017, at 12:19, Victor Stinner <victor.stinner at gmail.com <mailto:victor.stinner at gmail.com>> wrote:
> >
> > Hi,
> >
> > This is the first draft of a big (?) project to prepare CPython to be
> > able to "modernize" its implementation. Proposed changes should allow
> > to make CPython more efficient in the future. The optimizations
> > themself are out of the scope of the PEP, but some examples are listed
> > to explain why these changes are needed.
> 
> I’m not sure if hiding implementation details will help a lot w.r.t. making CPython more efficient, but cleaning up the public API would avoid accidentally depending on non-public information (and is sound engineering anyway).  That said, a lot of care should be taken to avoid breaking existing extensions as the ease of writing extensions is one of the strong points of CPython.
> 
> I also think the motivation doesn't have to be performance but simply cleaning up how we expose our C APIs to users as shown by the fact we have messed up the stable API by making it opt-out instead of opt-in.

I agree with this.

[…]

> >
> > Step 3: first pass of implementation detail removal
> > ---------------------------------------------------
> >
> > Modify the ``python`` API:
> >
> > * Add a new ``API`` subdirectory in the Python source code which will
> >  "implement" the Python C API
> > * Replace macros with functions. The implementation of new functions
> >  will be written in the ``API/`` directory. For example, Py_INCREF()
> >  becomes the function ``void Py_INCREF(PyObject *op)`` and its
> >  implementation will be written in the ``API`` directory.
> 
> In this particular case (Py_INCREF/DECREF) making them functions isn’t really useful and is likely to be harmful for performance. It is not useful because these macros manipulate state in a struct that must be public because that struct is included into the structs for custom objects (PyObject_HEAD). Having them as macro’s also doesn’t preclude moving to indirect reference counts. Moving to anything that isn’t reference counts likely needs changes to the API (but not necessarily, see PyPy’s cpext).
> 
> I think Victor has long-term plans to try and hide the struct details at a higher-level and so that would make macros a bad thing. But ignoring the specific Py_INCREF/DECREF example, switching to functions does buy us the ability to actually change the function implementations between Python versions compared to having to worry about what a macro used to do (which is a possibility with the stable ABI).

I don’t understand. Moving too functions instead of macros for some thing doesn’t really help with keeping the public API stable (for the non-stable ABI).  Avoiding macros does help with keeping more of the object internals hidden, and possibly easier to change within a major release, but doesn’t help (or hinder) changing the implementation of an API.

AFAIK there is no API stability guarantee for the details of the struct definitions for object representation, which is why it was possible to change the dict representation for CPython 3.6, and the str representation earlier.  I wouldn’t mind having to explicitly opt-in to getting access to those internals, but removing them from public headers altogether does have a cost.

>  
> 
> > * Slowly remove more and more implementation details from this API.
> >
> > Modifications of these API should be driven by tests of popular third
> > party packages like:
> >
> > * Django with database drivers
> > * numpy
> > * scipy
> > * Pillow
> > * lxml
> > * etc.
> >
> > Compilation errors on these extensions are expected. This step should
> > help to draw a line for the backward incompatible change.
> 
> This could also help to find places where the documented API is not sufficient.  One of the places where I poke directly into implementation details is a C-level subclass of str (PyUnicode_Type). I’d prefer not doing that, but AFAIK there is no other way to be string-like to the C API other than by being a subclass of str.
> 
> Yeah, this would allow us to very clearly know what should or should not be documented (I would say the same for the stdlib but we all know old code didn't hide things with a leading underscore consistently).

I tried to write about how this could help to evolve the API by exposing documented APIs or features for things where extensions currently directly peek and poke into implementation details.   Moving away from private stuff is a lot easier when there are sanctioned alternatives :-)


>  
> 
> BTW. The reason I need to subclass str: in PyObjC I use a subclass of str to represent Objective-C strings (NSString/NSMutableString), and I need to keep track of the original value; mostly because there are some Objective-C APIs that use object identity. The worst part is that fully initialising the PyUnicodeObject fields often isn’t necessary as a lot of Objective-C strings aren’t used as strings in Python code.
> 
> >
> >
> > Enhancements becoming possible thanks to a new C API
> > ====================================================
> >
> > Indirect Reference Counting
> > ---------------------------
> >
> > * Replace ``Py_ssize_t ob_refcnt;`` (integer)
> >  with ``Py_ssize_t *ob_refcnt;`` (pointer to an integer).
> > * Same change for GC headers?
> > * Store all reference counters in a separated memory block
> >  (or maybe multiple memory blocks)
> 
> This could be done right now with a minimal change to the API: just make the ob_refcnt and ob_type fields of the PyObject struct private by renaming them, in Py3 the documented way to access theses fields is through function macros and these could by changed to do indirect refcounting instead.
> 
> I think this is why Victor wants functions, because even if you change the names the macros will be locked into their implementations if you try to write code that supports multiple versions and so you can't change it per-version of Python.

I really don’t understand. The macros are part of the code for a version of Python and can be changed when necessary between python versions; the only  advantage of functions is that its easier to tweak the implementation in patch releases.

BTW. As I mentioned before the PyObject struct is one that cannot be made private without major changes because that struct is included in all extension object definitions by way of PyObject_HEAD.  But anyway, that’s just a particular example and doesn’t mean we cannot hide any implementation details.

Ronald

P.S. I’ve surfaced because I’m at EuroPython, and experience learns that I’ll likely submerge again afterwards even if I’d prefer not to do so :-(
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170713/a5ed1731/attachment.html>


More information about the Python-ideas mailing list