Barry Warsaw schrieb am 04.08.2018 um 00:23:
On Jul 31, 2018, at 23:23, Stefan Behnel wrote:
Just to make that point clearer, people commonly use Cython for three different main use cases:
a) Wrap native libraries for CPython (and less commonly for PyPy) with a pythonic interface. The cool feature there is that you can write python(-like) code that compiles down into the C layer and runs there, i.e. you can trivially make the wrapper as thin or thick as you want, without sacrificing performance. (This is pretty much the FFI case.)
b) Speed up Python code by compiling and optimising it using C data types. Here, Cython supports a very smooth transition from Python semantics and features down to low-level C speed and semantics, while still writing Python code (or very Python-like code, if you want) all along the way. You can even use Python type annotations for that these days.
c) Write C/C++ code without having to write C code. From the perspective of someone who has to write native code, statically typed Cython code compiles to the expected C code (when disabling the safety belts), but has a much nicer syntax, native access to the complete (C)Python and C ecosystems, and all Python features built into the language. Quite a number of people use it to write the C/C++ code they need but without the complex and error prone syntax.
Thanks for this write-up Stefan. FWIW, although all of these uses cases are important ones which Cython fulfills nicely, none specifically touch on the reason why I think we want a Cython-like tool in the stdlib. c) probably gets the closest though.
I would rather consider c) the least relevant reason. Those are the people who care about generating native code more than about interfacing it with Python. They could write C/C++ code (and then wrap that), it's just that they don't want to and prefer a simpler language that gives them the same performance.
This blog post here expresses it quite nicely:
https://explosion.ai/blog/writing-c-in-cython
What CPython wants to cater for is the first two groups of users, those who want to wrap native code for Python efficiently, and those who want to speed up their Python code by pushing their processing into C. My list of use cases above is (by chance) actually also ordered by the amount of C-API interaction. The wrapping case usually needs most interaction, for type conversions, calling, Python API creation, etc., followed by the speedup case which often involves some interaction on the way in, but then some substantial processing code that tries to avoid Python interaction to gain speed, and then more C-API interaction on the way back into Python. Obviously, using Cython to write C code (i.e. case c)) then specifically aims for having as little C-API interaction as possible.
The use case for such a tool built into Python is this:
Victor has a goal/plan for improving the performance of CPython by 2x, and there have been numerous attempts at removing the GIL (gilectomy), swapping reference counting for a gc, reducing the use of ABI-locking macros, etc. Most of these get thwarted by the realization that Python's C API will have to change, very likely in a backward incompatible way. And there is a *ton* of C extensions out there.
So let’s say that we decide these are important and achievable goals, but the API breakage necessitates a version bump to Python 4, let’s say in 5 years. The question is: what is the migration plan so that we can minimize the disruption to the extension module ecosystem?
One approach would be to discourage extension module authors from writing extensions directly against the C API, but providing them with the power they need to call into the C API, and marry that with third party C libraries of all ilk, in a higher level language (type annotated Python?) along with a code generator that has the intimate knowledge of the C API. If we get significant buy-in, we can think about large-scale evolution of the C API in sync with evolving the code generation tool. That would be the ideal solution: source code compatibility across C API and runtime changes.
As far as adopting Cython vs. writing a lightweight tool, I think if we were to go down this path, we’d need something that comes with CPython by default. Thus the classic “the stdlib is where code goes to die” conundrum. There’s also the question of whether the Cython syntax is best suited for this use case, and whether the tool is more than we need to accomplish this goal. Even if we had a separate tool, that wouldn’t eliminate the need for Cython, since it still addresses the use cases you mention.
Like Jeroen, I'm questioning your assumption that such a tool needs to be (or even should be) part of CPython. First of all, most Python users will not have a need for it (which, admittedly, could be said about many tools in the stdlib). But more importantly, Cython is not a runtime tool, it's a build-time code generator. Once you have a code generator in the stdlib, you're bound to the features it provides in a given Python version. And it will not generate code for you that works with the newest future Python version. Thus, in order to make use of it, you would always need the newest Python release with the newest code generator, which could then generate code that still works on older Python releases. Meaning, once a new Python version is released, with any substantial changes, the tools in older Python releases will become entirely useless. IMHO, a heavy argument against having it in the stdlib.
Alternatively, and that is probably what you are thinking of, the stdlib could contain a tool that only generates code for its specific Python version, without the need to care about backward or forward compatibility. That's probably nice from a CPython maintainer's perspective, but for developers, it means that they either need to push the usage of this tool down on their users, thus imposing a complete code generation and build step on them. That would make it really difficult for them to deal with a bug in the tool of a given CPython point release, and to make sure that the code works across all CPython point releases that users might want to use for the build process.
Or, let the developers use various different Python releases on their side to run their different code generator versions to create binary distributions (as they already do today). A source distribution could not contain any generated sources anymore, as they would not be portable. But then, what if the developers want to make use of newer features of the tool that are not available in older CPython releases yet that they still need to support? This is mostly just pushing the compatibility problem one layer up.
So, overall, I don't see how having such a tool in the stdlib would really improve the situation, but I can see lots of hints that it would get in the way of its users. It feels like it's bound to produce the same situation that the current distutils vs. setuptools world suffers from.
Stefan