[capi-sig]Re: Adding an official (minimal) Cython-like tool.

4 Aug 2018 · *ton*


      Barry Warsaw schrieb am 04.08.2018 um 00:23:
...
On Jul 31, 2018, at 23:23, Stefan Behnel wrote:
...
Just to make that point clearer, people commonly use Cython for three
different main use cases:
a) Wrap native libraries for CPython (and less commonly for PyPy) with a
pythonic interface. The cool feature there is that you can write
python(-like) code that compiles down into the C layer and runs there, i.e.
you can trivially make the wrapper as thin or thick as you want, without
sacrificing performance. (This is pretty much the FFI case.)
b) Speed up Python code by compiling and optimising it using C data types.
Here, Cython supports a very smooth transition from Python semantics and
features down to low-level C speed and semantics, while still writing
Python code (or very Python-like code, if you want) all along the way. You
can even use Python type annotations for that these days.
c) Write C/C++ code without having to write C code. From the perspective of
someone who has to write native code, statically typed Cython code compiles
to the expected C code (when disabling the safety belts), but has a much
nicer syntax, native access to the complete (C)Python and C ecosystems, and
all Python features built into the language. Quite a number of people use
it to write the C/C++ code they need but without the complex and error
prone syntax.
Thanks for this write-up Stefan.  FWIW, although all of these uses cases are important ones which Cython fulfills nicely, none specifically touch on the reason why I think we want a Cython-like tool in the stdlib.  c) probably gets the closest though.
I would rather consider c) the least relevant reason. Those are the people
who care about generating native code more than about interfacing it with
Python. They could write C/C++ code (and then wrap that), it's just that
they don't want to and prefer a simpler language that gives them the same
performance.
This blog post here expresses it quite nicely:
https://explosion.ai/blog/writing-c-in-cython
What CPython wants to cater for is the first two groups of users, those who
want to wrap native code for Python efficiently, and those who want to
speed up their Python code by pushing their processing into C. My list of
use cases above is (by chance) actually also ordered by the amount of C-API
interaction. The wrapping case usually needs most interaction, for type
conversions, calling, Python API creation, etc., followed by the speedup
case which often involves some interaction on the way in, but then some
substantial processing code that tries to avoid Python interaction to gain
speed, and then more C-API interaction on the way back into Python.
Obviously, using Cython to write C code (i.e. case c)) then specifically
aims for having as little C-API interaction as possible.
...
The use case for such a tool built into Python is this:
Victor has a goal/plan for improving the performance of CPython by 2x, and there have been numerous attempts at removing the GIL (gilectomy), swapping reference counting for a gc, reducing the use of ABI-locking macros, etc.  Most of these get thwarted by the realization that Python's C API will have to change, very likely in a backward incompatible way.  And there is a *ton* of C extensions out there.
So let’s say that we decide these are important and achievable goals, but the API breakage necessitates a version bump to Python 4, let’s say in 5 years.  The question is: what is the migration plan so that we can minimize the disruption to the extension module ecosystem?
One approach would be to discourage extension module authors from writing extensions directly against the C API, but providing them with the power they need to call into the C API, and marry that with third party C libraries of all ilk, in a higher level language (type annotated Python?) along with a code generator that has the intimate knowledge of the C API.  If we get significant buy-in, we can think about large-scale evolution of the C API in sync with evolving the code generation tool.  That would be the ideal solution: source code compatibility across C API and runtime changes.
As far as adopting Cython vs. writing a lightweight tool, I think if we were to go down this path, we’d need something that comes with CPython by default.  Thus the classic “the stdlib is where code goes to die” conundrum.  There’s also the question of whether the Cython syntax is best suited for this use case, and whether the tool is more than we need to accomplish this goal.  Even if we had a separate tool, that wouldn’t eliminate the need for Cython, since it still addresses the use cases you mention.
Like Jeroen, I'm questioning your assumption that such a tool needs to be
(or even should be) part of CPython. First of all, most Python users will
not have a need for it (which, admittedly, could be said about many tools
in the stdlib). But more importantly, Cython is not a runtime tool, it's a
build-time code generator. Once you have a code generator in the stdlib,
you're bound to the features it provides in a given Python version. And it
will not generate code for you that works with the newest future Python
version. Thus, in order to make use of it, you would always need the newest
Python release with the newest code generator, which could then generate
code that still works on older Python releases. Meaning, once a new Python
version is released, with any substantial changes, the tools in older
Python releases will become entirely useless. IMHO, a heavy argument
against having it in the stdlib.
Alternatively, and that is probably what you are thinking of, the stdlib
could contain a tool that only generates code for its specific Python
version, without the need to care about backward or forward compatibility.
That's probably nice from a CPython maintainer's perspective, but for
developers, it means that they either need to push the usage of this tool
down on their users, thus imposing a complete code generation and build
step on them. That would make it really difficult for them to deal with a
bug in the tool of a given CPython point release, and to make sure that the
code works across all CPython point releases that users might want to use
for the build process.
Or, let the developers use various different Python releases on their side
to run their different code generator versions to create binary
distributions (as they already do today). A source distribution could not
contain any generated sources anymore, as they would not be portable. But
then, what if the developers want to make use of newer features of the tool
that are not available in older CPython releases yet that they still need
to support? This is mostly just pushing the compatibility problem one layer up.
So, overall, I don't see how having such a tool in the stdlib would really
improve the situation, but I can see lots of hints that it would get in the
way of its users. It feels like it's bound to produce the same situation
that the current distutils vs. setuptools world suffers from.
Stefan