[capi-sig]Re: Let's change to C API!

30 Jul 2018 · *Hiding*

      Hi Stefan,
Thanks for your email, you asked many good questions :-) It seems like
my documentation is incomplete, especially the rationale part. It's
fine, I can complete it later. In the meanwhile, here are my answers
inline.
2018-07-29 23:40 GMT+02:00 Stefan Behnel python_capi@behnel.de:
...
From Cython's POV, exposing internals is a good thing that helps making
extension modules faster.
I'm fine with Cython wanting to pay the burden of following C API
changes to get best performance. But I would only allow Cython (and
cffi) to use it, not all C extensions ;-)
Technically, I plan to keep access to the full API giving access to C
structures and all low-level stuff, for specific use cases like Cython
and debug tools. But at the end of my roadmap, it will be an opt-in
option rather than the default.
Hopefully Cython exists to hide the ugly C API ;-)
...
*Hiding* internals would already break code, so I
don't see the advantage over just *changing* the internals instead, but
continuing to expose those new internals.
My main motivation to change the C API is to allow to change CPython.
For example, I would like to experiment a specialized implementation
of the list type which would store small integers as a C array of
int8_t, int16_t or int32_t to be more space efficient (I'm not sure
that it would be faster, I'm not really interested have SIMD-like
operations in the stdlib). Currently, PySequence_Fast_ITEMS() and
exposing the PyListObject structure prevent to experiment such
optimization, because PyListObject.ob_items leaks PyObject**. To be
honest, I'm not sure that this specific optimization is worth it, but
I like to give this example since it's easy to explain it.
...
The problem we have is the heap
of C extensions that are no longer (actively) maintained, not those that
are maintained but use internals.
This is why my project has a "backward compatibility" page:
https://pythoncapi.readthedocs.io/backward_compatibility.html
I would like to *remove* PyDict_GetItem(), but maybe we can provide a
3rd party C library which would reimplement PyDict_GetItem() on top of
the new PyDict_GetItemRef() function which returns a strong reference.
Currently, the page only explains the other side: be able to modify C
extensions to use the new name (but a macro or something else will
fallback on PyDict_GetItem() on Python 3.7 and older).
...
Basically, PyPy shows that, given enough developer time, the C-API can be
emulated even based on a very different runtime design, potentially with a
handicap in terms of performance. If some parts of the C-API are to be
replaced, that might be a way to go.
"potentially with a handicap in terms of performance"
Multiple PyPy developers told me that cpyext remains one of the most
important blocker issue to move an application away from CPython. I
wouldn't say that it's a solved issue.
Moreover, I'm not sure that optimizing cpyext is the favorite task of
PyPy developers. They are likely other parts of PyPy which would
deserve more love than cpyext, no? :-)
But I'm just guessing here, I would prefer to hear directly from PyPy
developers ;-)
...
I also don't buy the argument that binary modules built for, say, Py3.6
must continue to import on Py3.9, for example. Supporting the last couple
of supported releases with binary wheels has proven good enough IMHO, and
rebuilding for a new CPython release seems acceptable, given that this also
enables the use of new features. (Would be something to ask distributors,
though.)
I created the pythoncapi project between two flights, so sorry, my
rationale is still maybe incomplete :-)
From the point of view of Red Hat, a Linux vendor, having to support
multiple Python versions is a pain, especially for QA testing.
Currently, the compromise is to only provide one Python version per OS
release. For example, Fedora 28 only supports Python 3.6 even if
Python 3.7 has been released during Fedora 28 lifetime. For Fedora, in
practice, it's fine, since they are release every 6 months. Ubuntu LTS
is supported for 5 years, having an old Python version can be more
annoying. And then there is RHEL which is supported for 10 years (up
to 15 years for extended support). On that scale, Python release
schedule doesn't fit well with RHEL support.
By "supported Python version", I not only mean the /usr/bin/pythonX.Y
binary, but also packages for dozens of Python modules. Fedora 28
provides Python binaries for various Python versions (2.7, 2.7, 3.4,
3.5, 3.6, 3.7 if I recall correctly), but it has only python3-*
modules for Python 3.6.
Supporting 2 Python versions, like 3.6 and 3.7, means to double the
size of the repository, but also double the tests for tha QA team
(each time a new package version is released, usually for bugfixes).
What if you want to support 3 Python versions in parallel, if not
more?
... in the meanwhile, macOS is stuck at Python 2.7 :-) macOS users:
how much do like Python 2.7 in 2018?
This is one issue.
Another issue is the Python binary compiled in debug mode, known as
python-dbg (or python-debug or python-debuginfo). Right now, it's
mostly useless since Linux distributions don't provide two flavors of
Python modules (release and debug modes): you have to recompile
manually in debug mode all your C extensions used by your application.
Good luck with installing build dependencies and handling compilation
errors. Because of that, nobody uses the debug build, whereas it's
super useful to debug C extensions. As a consequence, we (Python
upstream, but also Linux vendors) get bug reports where a C extension
crashed and we are unable to debug it (oh, gc.collect() crashed on an
invalid object, deal with that!).
Moreover, right now, it's unclear if the C API is designed for CPython
internals or to be used by third party, if it should check all
arguments or not. Some functions check a few arguments, some others
don't. For the functions which check arguments: you get a slowdown,
even if your full application is using properly the C API. It's like
running a kind of debug build in production. Would you deploy a C
program compiled with assertions in production once you checked that
your application is bugfix? Why should we have to pay the price of
this "debug mode" in the Python compiled in "release mode"?
I would like to be able to remove most debug checks from a *release*
build, but also be able to run C extensions with a *different runtime*
which would be Python compiled in debug mode.
...
So, from my POV, I'd vote for

allowing C-API changes in each X.Y release

Which kind of changes do you want to do?
...

requiring a new binary wheel (or rebuild) for each X.Y release

It doesn't solve the issue of being stuck to one Python version per OS release.
...

providing a compatibility layer for "removed" C-API functionality

Above, I proposed to require a *library* for that. But you would only
be able to use such library with a Python runtime which remains fully
compatible with Python 3.7. No specialized list for you in that case!
That's the price of backward compatibility.
This is also where I would like to allow to have multiple Python
"runtimes" per Python version:

CPython compiled in release mode with backward compatibility: "python3"
CPython compiled in debug mode "python3-dbg"
experimental CPython, maybe faster: "experimental_python3", for
example with specialized list so incompatible with PyDict_GetItem()
and borrowed references

Technically, in CPython, it can be 3 different compilation modes of
the same code base.
But I also would like to let people do their own experiment with their
own CPython forks, again, without losing support for C extensions!
...

exposing any internals that may help extension modules

In my current roadmap, there is: "Step 4: if step 3 gone fine and most
people are still ok to continue, make the new C API as the default in
CPython and add an option for opt-out."
The "opt-out" option is the existing API which leaks all implementation details.
...

maybe add a warning to the docs of exposed internals that these are more
likely to change than other parts of the C-API

Yes, we have to work on the C API documentation of CPython. Right now,
I'm more at the first step on my roadmap:
"Step 1: Identify Bad C API and list functions that should be modified
or even removed"
A next step would be to start to document which APIs are "bad" in this
CPython documentation. Maybe start by adding a something like
"provisonal deprecation warning", but only in the documentation. Or a
real deprecation, but only in the doc, if we succeed to agree on APIs
that should go away.
...
I'd also suggest to make Cython, pybind11 and cffi (maybe a few more) the
preferred and official ways to extend and integrate with CPython, to keep
those three up to date with all C-API changes, and to make it as easy as
possible for users to build their code with them against new CPython releases.
I'm now really worried about new C extensions which already use
"modern" solutions like Cython and cffi.
My concern is the very long tail of C extensions which call directly
the C API. I'm sure that we can enhance the C API somehow without
breaking this long tail.
...
If you want a more radical proposal, I'd deprecate the C-API documentation,
push people into not caring about the C-API themselves, and then
concentrate on keeping the major code integration tools out there
compatible and fast with whatever CPython can provide as "exposed internals".
Honestly, at this point, I'm open to any idea! But I'm not ok to
"break the world". This plan is not going to work. Even if PyPy is
promoting cffi for years, the C API remains very popular and commonly
used.
I'm not sure that deprecating the API or the documentation would help.
In 2018, ten years after Python 3.0 has been released, we are still
discussing how to migrate old code base away from Python 2, even if
they are many tools doing "most" of the migration. I'm not even aware
of tools to rewrite a C extension using Cython or cffi. If it exists,
why would anyone take the risk of a regression since C extensions are
currently working perfectly on CPython?
My problem is to find a solution to change the C API without forcing C
extension authors to change their code "too much", maybe using a new
compatibility layers.
Victor

[capi-sig]Re: Let's change to C API!

Victor Stinner