[capi-sig]CPython C API Design Guidelines
I've had enough ideas bouncing around in my head that I had to get them written up :)
So I'm proposing to produce an informational PEP to describe what a "good" C API looks like and act as guidance as we implement new APIs or change existing ones.
This is a rough, incomplete first draft that nonetheless I think is enough to trigger useful discussions. It's a brain dump, but I've already dumped most of this before.
They're in the text below, but I'll repeat here:
- this is NOT a brand-new API
- this is NOT exactly what we currently have implemented
- this is NOT a proposal to stop shipping half the standard library
- this IS meant to provide context for discussing both the issues with our current API and to help drive discussions of any new API or API changes
I don't have any particular desire to own the entire doc, so if anyone wants to become a co-author I'm very open to that. However, I do have strong opinions on this topic after a number of years working with *excellent* API designs, designers and processes. If you want to propose a _totally_ different vision from this, please consider writing an alternative rather than trying to co-opt this one :)
(Doc in approximate Markdown, automatically wrapped to 72 cols for email and I haven't checked if that broke stuff. Sorry if it did)
Cheers, Steve
CPython C API Design Guidelines
# Abstract
This document is intended to be a set of guiding principles for development of the current CPython C API. Future additions and enhancements to the CPython C API should follow, or at least be influenced by, the principles described here. At a minimum, any new or modified C APIs should be able to be categorised according to the terminology defined here, even if exceptions have to be made.
# Things this document is NOT
This document is NOT a design of a completely new API (though a hypothetical new API should follow this design).
This document is NOT documentation of the current API (though the current API should come to resemble it over time).
This document is NOT a set of binding rules in the same sense as PEP 7 and PEP 8 (though designs should be tested against it and exceptions should be rare).
This document is NOT permission to make backwards-incompatible modifications to the current API (though backwards-incompatible modifications should still be made where warranted).
# Definitions
A common understanding of certain terms is necessary to talking about the CPython C API. This section has two goals: to clarify existing common terminology, and to introduce new terminology. Terms are presented in a logical order, rather than alphabetically.
## Existing terms
**Application**: Any independent program that can be launched directly. Compare and contrast with *extension*. CPython is normally considered an application.
**Extension**: A program that integrates into an application, and cannot be launched directly but must be loaded by that application. Python modules, native or otherwise, are considered extenions. When embedded into another application, CPython is considered an extension.
**Native extension**: A subset of all extensions that are compiled to the same language as the application they integrate with. When embedded into an application that is written in C or uses C-compatible conventions, CPython is considered a native extension.
**API**: Application Programming Interface. The set of interactions defined by an application to allow extensions to extend, control, and interact with the first. Typically refers to OOP objects and functions in the abstract. CPython has one API that applies for all scenarios in all contexts, though each scenario will likely only use a subset of this API.
**ABI**: Application Binary Interface. The implementation of an API such that its interactions can be realized by a digital computer. Typically includes memory layouts and binary representations, and is a function of the build tools used to compile CPython. CPython has different ABIs in different contexts, and a different ABI for native extensions compared to extensions.
**Stdlib**: Standard library. Components that build upon the Python language in order to provide useful building blocks and pre-written functionality for users.
## New terms
These terms are introduced briefly here and described in much greater detail below.
**API ring**: One subset of an API for the purpose of extension compatibility. Extensions to CPython care about rings. Extensions choose to target a particular ring to trade off between deeper integration and tighter coupling. Targeting one ring includes access to all rings outside of that one. Rings are orthogonal to layers.
**API layer**: One subset of an API for the purpose of application and internal compatibility. Applications that embed CPython, and the CPython implementation itself, cares about layers. Applications choose to adopt or implement a particular layer, implicitly including all lower layers. Layers are orthogonal to rings.
# Quick Overview
For context as you continue reading, these are the API **rings** provided by CPython:
- Python ring (equivalent of the Python language)
- CPython ring (CPython-specific APIs)
- Internal ring (intended for internal use only)
These are the API **layers** provided by CPython:
- Optional stdlib layer (dependencies that must be explicitly required)
- Required stdlib layer (dependencies that can be assumed)
- Platform adaption layer (ability to interact with the platform)
- Core layer ("pure" mode with no platform interactivity)
(Reminder that this document does not reflect the current state of CPython, but is both aspirational and defining terms for the purposes of discussion. This is not a proposal to remove anything from the standard distribution!)
# API Rings
CPython provides three API rings, listed here from outermost to innermost:
- Python ring
- CPython ring
- Internal ring
An extension that targets the Python ring does not have access to the CPython or Internal rings. Likewise, an extension that targets the CPython ring does not have access to the Internal ring, but does use the Python ring.
When CPython is an extension of another application, that application can also select which ring to target.
The expectation is that all Python implementations can provide an equivalent Python ring, CPython officially supports extensions using the CPython ring when targeting CPython, and the Internal ring is available but unsupported.
## Python API ring
The Python ring provides functionality that should be equivalent across all Python implementations - in essence, the Python language itself defines this ring.
The C implementation of the Python API allows native code to interact with Python objects as if it were written in Python. The Python API supports duck-typing and should correctly handle the substitution of alternative types.
For a concrete example, PyObject_GetItem
is part of the Python ring
while PyDict_GetItem
is in the CPython ring.
Compatibility requirements for the Python API match the language version. Specifically, code relying on the Python API should only break or change behaviour if the equivalent code written in Python would also break or change behaviour.
For CPython, including Python.h
should only provide access to the
Python ring. Accessing any other rings should produce a compile error.
## CPython API ring
The CPython ring provides functionality that is specific to CPython. Extensions that opt in to the CPython ring are tied directly to CPython, but have access to functions that are specific to CPython.
Functions in the CPython ring may require the caller to be using C or be able to provide C structures allocated in memory.
In general, most applications that embed CPython will use the CPython ring. Also, native extensions in the Optional stdlib layer
For a concrete example, the PyCapsule
type belongs in the CPython ring
(that is, other implementations are not required to provide this
particular way to smuggle C pointers through Python objects).
As a second concrete example, PyType_FromSpec
belongs in the CPython
ring. (The equivalent in the Python ring would be to call the type
object, while the equivalent in the internal ring would be to define a
statis PyTypeObject
.)
Compatibility requirements for the CPython API match the CPython major.minor version. Specifically, code relying on the CPython API should only break or change behaviour if the major.minor version changes.
For CPython, as well as Python.h
, also include cpython/<header>.h
to
obtain access to APIs in the CPython ring.
## Internal API ring
The Internal ring provides functionality that is used to implement CPython. Extensions that opt in to the Internal ring may need to rebuild for every CPython build.
In general, most of the Required stdlib layer will use the Internal ring.
For CPython, as well as Python.h
, also include internal/<header>.h
to obtain access to APIs in the Internal ring.
# API Layers
CPython provides four API layers, listed here from top to bottom:
- Optional stdlib layer
- Required stdlib layer
- Platform adaptation layer
- Core layer
An application embedding Python targets one layer and all those below it, which affects the functionality available in Python.
Higher layers may depend on the APIs provided by lower layers, but not the other way around. In general, layers should aim to maximise interaction with the next layer down and avoid skipping it, but this is not a strict requirement.
Lower layers are required to maintain backwards compatibility more strictly than the layers above them.
Components within a layer that depend on other components within that layer must be treated as a single component for determining whether it may be included or omitted.
Standard Python distributions (that is, anything that may be launched
with the python
command) will depend upon most components in the
Optional stdlib layer, and hence will require _everything_ from the
Required stdlib layer and below. Only embedders and potentially
deployment tools will use reduced layers.
(Reminder: this document does not present the current state of CPython.)
## Core layer
This layer is the core language and evaluation engine. By adopting this layer, an application can provide platform-independent Python execution. However, it may require providing implementations of a number of callbacks in order to be functional (e.g. for dynamic memory allocation).
Examples of current components that fit into the core layer:
- Most of most built-in types (str, int, list, dict, etc.)
- compile, exec, eval
- read-only members of the sys module
- import
Important but potentially non-obvious implications of relying only on the core layer:
- Dynamic memory allocation/deallocation is part of the Platform adaptation layer, but there is no way to avoid it here. So any user of the core API will need to provide allocators and deallocators. The CPython Platform adaptation layer provides the "default" implementations, but if an embedder does not want to use these then targeting the Core layer will omit them.
- File system and standard streams are part of the Platform adaptation
layer, which leaves
open
andsys.stdout
(among others) without a default implementation. An application that wants to support these without adding more layers needs to provide its own implementations - The core layer only exposes UTF-8 APIs. Encoding and decoding for the current platform requires the Platform adaptation layer, while arbitrary encoding and decoding requires the Optional stdlib layer.
- Imports in the core layer are satisfied by a "blind" callback. The Platform adaptation layer provides the support for frozen, bytecode and natively-encoded source imports, while the Optional stdlib layer is required for arbitrary encodings in source files
Platform adaptation layer
This layer provides the CPython implementation of platform-specific adapters to support the core layer.
- Memory allocation/deallocation
- File system access
- Standard input/output streams
- Cryptographic random number generation
- os module
- CPython imports
Important but potentially non-obvious implications of relying only on the platform adaptation layer:
- File system access generally requires text encodings, but the full set of codecs are in the optional stdlib layer. To fully separate these layers, an implementation of the current file system encoding would be required in the Platform adaptation layer. (But arbitrarily encoding/decoding the _contents_ of a file may require higher layers.)
- Importing from source code may also require arbitrary encodings, but imports that can be fully satisfied without this are provided here (e.g. native extension modules, precompiled bytecode, frozen modules, natively encoded source files)
Required stdlib layer
This layer provides common APIs for interactions between other modules. All components in the Optional stdilib layer may assume that if _they_ are present, everything in this layer is also present.
- standard ABCs
- compiler services (e.g.
copy
,functools
,traceback
) - standard interop types (e.g.
pathlib
,enum
,dataclasses
)
Optional stdlib layer
This layer provides modules that fundamentally stand alone. None of the lower levels may depend on these components being present, and components in this layer should explicitly declare dependencies on others in the same layer.
This layer is valuable for embedders and distributors that want to omit
certain functionality. For example, omitting socket
should be possible
when that functionality is not required, as it is in the Optional stdlib
layer, and omitting it should only affect those components in the
Optional stdlib layer that have explicitly required it.
- platform-independent algorithms (e.g.
itertools
,statistics
) - application-specific functionality (e.g.
email
,socket
,ftplib
,ssl
) - additional compiler services (e.g.
ast
) - text codecs (e.g.
base64
,codecs
,encodings
) - Python-level FFI (e.g.
ctypes
) - tools (e.g.
idlelib
,pynche
,distutils
,msilib
) - configuration/information (e.g.
site
,sysconfig
,platform
)
Components in the Optional stdlib layer may be independently versioned.
On Fri, 22 Feb 2019 at 04:27, Steve Dower <steve.dower@python.org> wrote:
I've had enough ideas bouncing around in my head that I had to get them written up :)
So I'm proposing to produce an informational PEP to describe what a "good" C API looks like and act as guidance as we implement new APIs or change existing ones.
This is a rough, incomplete first draft that nonetheless I think is enough to trigger useful discussions. It's a brain dump, but I've already dumped most of this before.
They're in the text below, but I'll repeat here:
- this is NOT a brand-new API
- this is NOT exactly what we currently have implemented
- this is NOT a proposal to stop shipping half the standard library
- this IS meant to provide context for discussing both the issues with our current API and to help drive discussions of any new API or API changes
I don't have any particular desire to own the entire doc, so if anyone wants to become a co-author I'm very open to that. However, I do have strong opinions on this topic after a number of years working with *excellent* API designs, designers and processes. If you want to propose a _totally_ different vision from this, please consider writing an alternative rather than trying to co-opt this one :)
As a broad design vision, I pretty much agree with most of what you've laid out here. The only things I'd potentially add would be:
- "Status quo: ..." notes for accessing the different rings (e.g. including Python.h currently still gives you the full CPython API by default)
- I think there's another C API layer in between the "Python Layer" and the "CPython Layer", which would be something like the "Fast FFI Layer" - semantically, it would be on par with the Python layer, but it shouldn't suffer significant runtime performance overhead problems from the Python layer's dynamic dispatch capabilities. Stable ABI elements like PyDict_GetItem and PyType_FromSpec would live at this layer rather than the recompile-for-every-feature-release CPython layer.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi,
As the migration to Python 3 showed us, any migration is painful and takes a very long time. Your project would only be successful if the critical mass of C extension move it. But it's really unclear to me why a maintainer would like to invest time in migrating to these new rings.
tooling and documentation should be provided to help to migrate to such new model.
More generally, I don't see the direct benefit for PyPy or other Python implementation. The proposal doesn't say anything about reference counting and GC which are the blocker issues of the current C API.
I expect that such API would be easier to be used for embedded Python, but it's non obvious what are the benefit for CPython in general. I expect a lot of refactoring and additional code to handle these new layers.
Le jeu. 21 févr. 2019 à 19:26, Steve Dower <steve.dower@python.org> a écrit :
For context as you continue reading, these are the API **rings** provided by CPython:
- Python ring (equivalent of the Python language)
- CPython ring (CPython-specific APIs)
- Internal ring (intended for internal use only)
These are the API **layers** provided by CPython:
- Optional stdlib layer (dependencies that must be explicitly required)
- Required stdlib layer (dependencies that can be assumed)
- Platform adaption layer (ability to interact with the platform)
- Core layer ("pure" mode with no platform interactivity)
I'm not sure of the kind of problems that you are trying to solve with these new separations with "rings".
For a concrete example,
PyObject_GetItem
is part of the Python ring whilePyDict_GetItem
is in the CPython ring.
I like that :-)
For CPython, including
Python.h
should only provide access to the Python ring. Accessing any other rings should produce a compile error.
That's a major backward incompatible change, right?
Right now, Python.h gives also what you call the "CPython ring".
Compatibility requirements for the CPython API match the CPython major.minor version. Specifically, code relying on the CPython API should only break or change behaviour if the major.minor version changes.
Currently, there is a "stable API" which is supposed to be compatible on multiple Python versions, not only a specific X.Y. Projects like PyQt uses this "stable ABI" (sorry, I wrote API and then ABI, the difference is subtle and confuses me). You don't say anything about it here.
Lower layers are required to maintain backwards compatibility more strictly than the layers above them.
Hum, I don't understand well the separation between the CPython implementation and the "Python ring" (API). CPython core evolves way more faster than its API. Here you only talk about the API, right?
Components within a layer that depend on other components within that layer must be treated as a single component for determining whether it may be included or omitted.
Currently, the whole API are based on some key features:
- PyObject object model and C structures
- CPython GC implementation
- CPython memory allocators
I'm not sure how adding more layers will help to move away from these "legacy contraints".
- Dynamic memory allocation/deallocation is part of the Platform adaptation layer, but there is no way to avoid it here. So any user of the core API will need to provide allocators and deallocators. The CPython Platform adaptation layer provides the "default" implementations, but if an embedder does not want to use these then targeting the Core layer will omit them.
- File system and standard streams are part of the Platform adaptation layer, which leaves
open
andsys.stdout
(among others) without a default implementation. An application that wants to support these without adding more layers needs to provide its own implementations- The core layer only exposes UTF-8 APIs. Encoding and decoding for the current platform requires the Platform adaptation layer, while arbitrary encoding and decoding requires the Optional stdlib layer.
- Imports in the core layer are satisfied by a "blind" callback. The Platform adaptation layer provides the support for frozen, bytecode and natively-encoded source imports, while the Optional stdlib layer is required for arbitrary encodings in source files
Oh wow, that's going a little bit too far into the complex "Python initialization API" problem. I would prefer to discuss it in a separated PEP / thread.
Victor
Night gathers, and now my watch begins. It shall not end until my death.
On Sat, 23 Feb 2019 at 02:40, Victor Stinner <vstinner@redhat.com> wrote:
Currently, the whole API are based on some key features:
- PyObject object model and C structures
- CPython GC implementation
- CPython memory allocators
I'm not sure how adding more layers will help to move away from these "legacy contraints".
Don't think of it as adding more layers - think of it as clarifying the layers that already exist.
For example, there are some layers in the existing API that you can use with nary an INCREF or DECREF in your own code, because all you're doing is taking the results of language runtime API calls and passing them back to other API calls, but that isn't clear from the current API structure.
- Dynamic memory allocation/deallocation is part of the Platform adaptation layer, but there is no way to avoid it here. So any user of the core API will need to provide allocators and deallocators. The CPython Platform adaptation layer provides the "default" implementations, but if an embedder does not want to use these then targeting the Core layer will omit them.
- File system and standard streams are part of the Platform adaptation layer, which leaves
open
andsys.stdout
(among others) without a default implementation. An application that wants to support these without adding more layers needs to provide its own implementations- The core layer only exposes UTF-8 APIs. Encoding and decoding for the current platform requires the Platform adaptation layer, while arbitrary encoding and decoding requires the Optional stdlib layer.
- Imports in the core layer are satisfied by a "blind" callback. The Platform adaptation layer provides the support for frozen, bytecode and natively-encoded source imports, while the Optional stdlib layer is required for arbitrary encodings in source files
Oh wow, that's going a little bit too far into the complex "Python initialization API" problem. I would prefer to discuss it in a separated PEP / thread.
The two are pretty closely related, since the further down the layer stack you get, the more runtime specific the APIs involved become. Components up at the "optional stdlib component" layer would ideally be as portable as pure Python code, and not be tightly coupled to the CPython at runtime all, even when they're implemented as native extensions.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 22Feb2019 0838, Victor Stinner wrote:
Hi,
As the migration to Python 3 showed us, any migration is painful and takes a very long time. Your project would only be successful if the critical mass of C extension move it. But it's really unclear to me why a maintainer would like to invest time in migrating to these new rings.
tooling and documentation should be provided to help to migrate to such new model.
More generally, I don't see the direct benefit for PyPy or other Python implementation. The proposal doesn't say anything about reference counting and GC which are the blocker issues of the current C API.
To be clear, I'm not proposing any migration here, just trying to define the language and architecture in a way that both:
- lets us talk about what we currently have
- lets us talk about the impact of particular changes
The migration proposals will come later :)
Le jeu. 21 févr. 2019 à 19:26, Steve Dower <steve.dower@python.org> a écrit :
For context as you continue reading, these are the API **rings** provided by CPython:
- Python ring (equivalent of the Python language)
- CPython ring (CPython-specific APIs)
- Internal ring (intended for internal use only)
For a concrete example,
PyObject_GetItem
is part of the Python ring whilePyDict_GetItem
is in the CPython ring.I like that :-)
This is the kind of problem that is solved with "rings" - it gives us clear categories to put APIs into so that we can discuss them.
They are called rings because if you are in a particular ring, you are also able to use everything in outer rings too. (Think of a bullseye, not a Venn diagram.)
I've actually based the rings on how you've been adjusting the Include directory layout, so I'd expect you'd be okay with it ;)
For CPython, including
Python.h
should only provide access to the Python ring. Accessing any other rings should produce a compile error.That's a major backward incompatible change, right?
Right now, Python.h gives also what you call the "CPython ring".
Right - typo on my part. This should have said "With Py_LIMITED_ABI defined".
Compatibility requirements for the CPython API match the CPython major.minor version. Specifically, code relying on the CPython API should only break or change behaviour if the major.minor version changes.
Currently, there is a "stable API" which is supposed to be compatible on multiple Python versions, not only a specific X.Y. Projects like PyQt uses this "stable ABI" (sorry, I wrote API and then ABI, the difference is subtle and confuses me). You don't say anything about it here.
Yes, the stable API represents the Python API ring (currently), and it should be changed more carefully than the CPython ring. I *think* there are mainly "normal" Python operations in the stable API, but again, the point is that right now CPython does not perfectly match a good design here. So I just want to put out what a good design looks like so that we can know what direction we've been moving.
I'm open to having better explanations here of what compatibility guarantees we make here.
Lower layers are required to maintain backwards compatibility more strictly than the layers above them.
Hum, I don't understand well the separation between the CPython implementation and the "Python ring" (API). CPython core evolves way more faster than its API. Here you only talk about the API, right?
Layers and rings are totally separate concepts.
The CPython implementation is currently all four layers and all three rings, and it will always stay that way. That said, it is a good architecture to define layers, and it is good for our extenders to define API rings (like public/stable/internal/etc.).
Components within a layer that depend on other components within that layer must be treated as a single component for determining whether it may be included or omitted.
Currently, the whole API are based on some key features:
- PyObject object model and C structures
- CPython GC implementation
- CPython memory allocators
I'm not sure how adding more layers will help to move away from these "legacy contraints".
As Nick said, I'm not trying to introduce new layers, but to formalise what we already have.
That said, I did mix up my initial proposal between the initialization sequence and cross-component dependencies. Memory allocation needs to be part of the core layer, as you can't "be Python" without memory allocation (but it should still be pluggable). Reference counting and GC are also in this layer.
What it means is that higher layers like the Python code in the standard library shouldn't *rely* on Python being reference counted, because it's too many layers away. But of course, this is impossible! So we ought to be aware of where we do make assumptions like this so that it's easier for layers to be changed.
But apart from things like pluggable memory allocators, I looked at where we might "swap things out". For example, on Windows we have different implementations of a lot of functions, so these should be in the Platform adaptation layer (to adapt to the platform).
And then there may be other runtimes that can use most of the standard library, which means those parts of the standard library are in their own layer and you can run them against a different Platform adaptation layer and different core layer (such as Jython or PyPy).
And then there are parts of the standard library that you may not want, such as sockets (e.g. when embedding Python in an application). But if you remove sockets, what else do you have to remove? If the layers are done properly, you shouldn't have to remove anything in a *lower* layer, but only things in the same (or a higher) layer. (And this one is a real need for me, so I have to figure out how to remove sockets anyway and all the stdlib that depends on it anyway.)
But layers is definitely more complicated. Once you get to the top layer, it looks more like a dependency tree I think. Though the Core and Platform adaptation layers are fairly standard.
Oh wow, that's going a little bit too far into the complex "Python initialization API" problem. I would prefer to discuss it in a separated PEP / thread.
Sure, but only when we're planning to change it. PEP 432 already exists for changing this, and all I really want to do is document it.
And maybe my examples here are a bit too much "imagining the future", but that's because we have so many layer violations right now that it's hard to give good examples of where it is at present ;)
Thanks for the feedback!
Cheers, Steve
On Sat, 23 Feb 2019 at 10:03, Steve Dower <steve.dower@python.org> wrote:
And then there are parts of the standard library that you may not want, such as sockets (e.g. when embedding Python in an application). But if you remove sockets, what else do you have to remove? If the layers are done properly, you shouldn't have to remove anything in a *lower* layer, but only things in the same (or a higher) layer. (And this one is a real need for me, so I have to figure out how to remove sockets anyway and all the stdlib that depends on it anyway.)
Slight tangent: https://www.python.org/dev/peps/pep-0534/ may be of interest to you in offering a potential way to improve the exceptions reported when folks try to use stdlib modules that have been deliberately (or accidentally!) omitted from a Python installation.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I’m very late chiming in, but here goes anyway. I really like this because I think the orthogonality of layers and rings solves the problems that all simplifications of the API (going back to the days when object.h was given hands and feet) have struggled with: different includers of the headers have different ideas of what the layering is. I have the feeling that issues like the GC issue aren’t showstoppers, and can be solved by requiring (for future code) that anything that relies on refcounting must include “cpython/refcounting.h” early on, and if possible and desirable GC can be abstracted out later with a more generally applicable API.
One of the things I’m wondering: would it be an idea to sacrifice the toplevel include files for backward compatibility? In other words: move things like “Python.h” into “api/Python.h” for all future code. so that the toplevel Python.h be initially used for backward compatibility, then at some future release started producing compile time warnings, then for a later release disappear?
And another thing I’m wondering about is cost/benefit analysis. I’m assuming (and I’m assuming you’re assuming) this whole thing will be beneficial in the long run for backward/forward compatibility of extenders, embedders and ports, and thereby make life easier for the people maintaining those. So I wonder whether it would be worth it to see what the effect would be on some of those. I can think of ports of Python to strange and wondrous platforms (MicroPython, or MVS Python, if MVS is actually still strange and wondrous) or with strange and wondrous embedding needs (I can think of PyObjC with its two-way transparent bridging of objects).
Jack
On 21 Feb 2019, at 19:25, Steve Dower <steve.dower@python.org> wrote:
I've had enough ideas bouncing around in my head that I had to get them written up :)
So I'm proposing to produce an informational PEP to describe what a "good" C API looks like and act as guidance as we implement new APIs or change existing ones.
This is a rough, incomplete first draft that nonetheless I think is enough to trigger useful discussions. It's a brain dump, but I've already dumped most of this before.
They're in the text below, but I'll repeat here:
- this is NOT a brand-new API
- this is NOT exactly what we currently have implemented
- this is NOT a proposal to stop shipping half the standard library
- this IS meant to provide context for discussing both the issues with our current API and to help drive discussions of any new API or API changes
I don't have any particular desire to own the entire doc, so if anyone wants to become a co-author I'm very open to that. However, I do have strong opinions on this topic after a number of years working with *excellent* API designs, designers and processes. If you want to propose a _totally_ different vision from this, please consider writing an alternative rather than trying to co-opt this one :)
(Doc in approximate Markdown, automatically wrapped to 72 cols for email and I haven't checked if that broke stuff. Sorry if it did)
Cheers, Steve
CPython C API Design Guidelines
# Abstract
This document is intended to be a set of guiding principles for development of the current CPython C API. Future additions and enhancements to the CPython C API should follow, or at least be influenced by, the principles described here. At a minimum, any new or modified C APIs should be able to be categorised according to the terminology defined here, even if exceptions have to be made.
# Things this document is NOT
This document is NOT a design of a completely new API (though a hypothetical new API should follow this design).
This document is NOT documentation of the current API (though the current API should come to resemble it over time).
This document is NOT a set of binding rules in the same sense as PEP 7 and PEP 8 (though designs should be tested against it and exceptions should be rare).
This document is NOT permission to make backwards-incompatible modifications to the current API (though backwards-incompatible modifications should still be made where warranted).
# Definitions
A common understanding of certain terms is necessary to talking about the CPython C API. This section has two goals: to clarify existing common terminology, and to introduce new terminology. Terms are presented in a logical order, rather than alphabetically.
## Existing terms
**Application**: Any independent program that can be launched directly. Compare and contrast with *extension*. CPython is normally considered an application.
**Extension**: A program that integrates into an application, and cannot be launched directly but must be loaded by that application. Python modules, native or otherwise, are considered extenions. When embedded into another application, CPython is considered an extension.
**Native extension**: A subset of all extensions that are compiled to the same language as the application they integrate with. When embedded into an application that is written in C or uses C-compatible conventions, CPython is considered a native extension.
**API**: Application Programming Interface. The set of interactions defined by an application to allow extensions to extend, control, and interact with the first. Typically refers to OOP objects and functions in the abstract. CPython has one API that applies for all scenarios in all contexts, though each scenario will likely only use a subset of this API.
**ABI**: Application Binary Interface. The implementation of an API such that its interactions can be realized by a digital computer. Typically includes memory layouts and binary representations, and is a function of the build tools used to compile CPython. CPython has different ABIs in different contexts, and a different ABI for native extensions compared to extensions.
**Stdlib**: Standard library. Components that build upon the Python language in order to provide useful building blocks and pre-written functionality for users.
## New terms
These terms are introduced briefly here and described in much greater detail below.
**API ring**: One subset of an API for the purpose of extension compatibility. Extensions to CPython care about rings. Extensions choose to target a particular ring to trade off between deeper integration and tighter coupling. Targeting one ring includes access to all rings outside of that one. Rings are orthogonal to layers.
**API layer**: One subset of an API for the purpose of application and internal compatibility. Applications that embed CPython, and the CPython implementation itself, cares about layers. Applications choose to adopt or implement a particular layer, implicitly including all lower layers. Layers are orthogonal to rings.
# Quick Overview
For context as you continue reading, these are the API **rings** provided by CPython:
- Python ring (equivalent of the Python language)
- CPython ring (CPython-specific APIs)
- Internal ring (intended for internal use only)
These are the API **layers** provided by CPython:
- Optional stdlib layer (dependencies that must be explicitly required)
- Required stdlib layer (dependencies that can be assumed)
- Platform adaption layer (ability to interact with the platform)
- Core layer ("pure" mode with no platform interactivity)
(Reminder that this document does not reflect the current state of CPython, but is both aspirational and defining terms for the purposes of discussion. This is not a proposal to remove anything from the standard distribution!)
# API Rings
CPython provides three API rings, listed here from outermost to innermost:
- Python ring
- CPython ring
- Internal ring
An extension that targets the Python ring does not have access to the CPython or Internal rings. Likewise, an extension that targets the CPython ring does not have access to the Internal ring, but does use the Python ring.
When CPython is an extension of another application, that application can also select which ring to target.
The expectation is that all Python implementations can provide an equivalent Python ring, CPython officially supports extensions using the CPython ring when targeting CPython, and the Internal ring is available but unsupported.
## Python API ring
The Python ring provides functionality that should be equivalent across all Python implementations - in essence, the Python language itself defines this ring.
The C implementation of the Python API allows native code to interact with Python objects as if it were written in Python. The Python API supports duck-typing and should correctly handle the substitution of alternative types.
For a concrete example,
PyObject_GetItem
is part of the Python ring whilePyDict_GetItem
is in the CPython ring.Compatibility requirements for the Python API match the language version. Specifically, code relying on the Python API should only break or change behaviour if the equivalent code written in Python would also break or change behaviour.
For CPython, including
Python.h
should only provide access to the Python ring. Accessing any other rings should produce a compile error.## CPython API ring
The CPython ring provides functionality that is specific to CPython. Extensions that opt in to the CPython ring are tied directly to CPython, but have access to functions that are specific to CPython.
Functions in the CPython ring may require the caller to be using C or be able to provide C structures allocated in memory.
In general, most applications that embed CPython will use the CPython ring. Also, native extensions in the Optional stdlib layer
For a concrete example, the
PyCapsule
type belongs in the CPython ring (that is, other implementations are not required to provide this particular way to smuggle C pointers through Python objects).As a second concrete example,
PyType_FromSpec
belongs in the CPython ring. (The equivalent in the Python ring would be to call thetype
object, while the equivalent in the internal ring would be to define a statisPyTypeObject
.)Compatibility requirements for the CPython API match the CPython major.minor version. Specifically, code relying on the CPython API should only break or change behaviour if the major.minor version changes.
For CPython, as well as
Python.h
, also includecpython/<header>.h
to obtain access to APIs in the CPython ring.## Internal API ring
The Internal ring provides functionality that is used to implement CPython. Extensions that opt in to the Internal ring may need to rebuild for every CPython build.
In general, most of the Required stdlib layer will use the Internal ring.
For CPython, as well as
Python.h
, also includeinternal/<header>.h
to obtain access to APIs in the Internal ring.# API Layers
CPython provides four API layers, listed here from top to bottom:
- Optional stdlib layer
- Required stdlib layer
- Platform adaptation layer
- Core layer
An application embedding Python targets one layer and all those below it, which affects the functionality available in Python.
Higher layers may depend on the APIs provided by lower layers, but not the other way around. In general, layers should aim to maximise interaction with the next layer down and avoid skipping it, but this is not a strict requirement.
Lower layers are required to maintain backwards compatibility more strictly than the layers above them.
Components within a layer that depend on other components within that layer must be treated as a single component for determining whether it may be included or omitted.
Standard Python distributions (that is, anything that may be launched with the
python
command) will depend upon most components in the Optional stdlib layer, and hence will require _everything_ from the Required stdlib layer and below. Only embedders and potentially deployment tools will use reduced layers.(Reminder: this document does not present the current state of CPython.)
## Core layer
This layer is the core language and evaluation engine. By adopting this layer, an application can provide platform-independent Python execution. However, it may require providing implementations of a number of callbacks in order to be functional (e.g. for dynamic memory allocation).
Examples of current components that fit into the core layer:
- Most of most built-in types (str, int, list, dict, etc.)
- compile, exec, eval
- read-only members of the sys module
- import
Important but potentially non-obvious implications of relying only on the core layer:
- Dynamic memory allocation/deallocation is part of the Platform adaptation layer, but there is no way to avoid it here. So any user of the core API will need to provide allocators and deallocators. The CPython Platform adaptation layer provides the "default" implementations, but if an embedder does not want to use these then targeting the Core layer will omit them.
- File system and standard streams are part of the Platform adaptation layer, which leaves
open
andsys.stdout
(among others) without a default implementation. An application that wants to support these without adding more layers needs to provide its own implementations- The core layer only exposes UTF-8 APIs. Encoding and decoding for the current platform requires the Platform adaptation layer, while arbitrary encoding and decoding requires the Optional stdlib layer.
- Imports in the core layer are satisfied by a "blind" callback. The Platform adaptation layer provides the support for frozen, bytecode and natively-encoded source imports, while the Optional stdlib layer is required for arbitrary encodings in source files
Platform adaptation layer
This layer provides the CPython implementation of platform-specific adapters to support the core layer.
- Memory allocation/deallocation
- File system access
- Standard input/output streams
- Cryptographic random number generation
- os module
- CPython imports
Important but potentially non-obvious implications of relying only on the platform adaptation layer:
- File system access generally requires text encodings, but the full set of codecs are in the optional stdlib layer. To fully separate these layers, an implementation of the current file system encoding would be required in the Platform adaptation layer. (But arbitrarily encoding/decoding the _contents_ of a file may require higher layers.)
- Importing from source code may also require arbitrary encodings, but imports that can be fully satisfied without this are provided here (e.g. native extension modules, precompiled bytecode, frozen modules, natively encoded source files)
Required stdlib layer
This layer provides common APIs for interactions between other modules. All components in the Optional stdilib layer may assume that if _they_ are present, everything in this layer is also present.
- standard ABCs
- compiler services (e.g.
copy
,functools
,traceback
)- standard interop types (e.g.
pathlib
,enum
,dataclasses
)Optional stdlib layer
This layer provides modules that fundamentally stand alone. None of the lower levels may depend on these components being present, and components in this layer should explicitly declare dependencies on others in the same layer.
This layer is valuable for embedders and distributors that want to omit certain functionality. For example, omitting
socket
should be possible when that functionality is not required, as it is in the Optional stdlib layer, and omitting it should only affect those components in the Optional stdlib layer that have explicitly required it.
- platform-independent algorithms (e.g.
itertools
,statistics
)- application-specific functionality (e.g.
socket
,ftplib
,ssl
)- additional compiler services (e.g.
ast
)- text codecs (e.g.
base64
,codecs
,encodings
)- Python-level FFI (e.g.
ctypes
)- tools (e.g.
idlelib
,pynche
,distutils
,msilib
)- configuration/information (e.g.
site
,sysconfig
,platform
)Components in the Optional stdlib layer may be independently versioned.
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org
participants (4)
-
Jack Jansen
-
Nick Coghlan
-
Steve Dower
-
Victor Stinner