[capi-sig]Re: Adding an official (minimal) Cython-like tool.
On 2018-07-31 18:42, Brett Cannon wrote:
Yes, what Eric is suggesting is a baseline tool like Cython be added to Python itself so it becomes the minimum, common tool that we point all extension authors to.
You didn't answer Victor's "stupid question": we can already point extension authors to Cython. Why do we need a new tool which will very likely have less features than the already-existing Cython?
then we will have to provide an FFI compiler for people to use in at least the simple cases.
Cython is *not* an FFI tool. It can be used for FFI but that's just one of its many use cases.
Jeroen Demeyer schrieb am 01.08.2018 um 00:33:
On 2018-07-31 18:42, Brett Cannon wrote:
then we will have to provide an FFI compiler for people to use in at least the simple cases.
Cython is *not* an FFI tool. It can be used for FFI but that's just one of its many use cases.
Just to make that point clearer, people commonly use Cython for three different main use cases:
a) Wrap native libraries for CPython (and less commonly for PyPy) with a pythonic interface. The cool feature there is that you can write python(-like) code that compiles down into the C layer and runs there, i.e. you can trivially make the wrapper as thin or thick as you want, without sacrificing performance. (This is pretty much the FFI case.)
b) Speed up Python code by compiling and optimising it using C data types. Here, Cython supports a very smooth transition from Python semantics and features down to low-level C speed and semantics, while still writing Python code (or very Python-like code, if you want) all along the way. You can even use Python type annotations for that these days.
c) Write C/C++ code without having to write C code. From the perspective of someone who has to write native code, statically typed Cython code compiles to the expected C code (when disabling the safety belts), but has a much nicer syntax, native access to the complete (C)Python and C ecosystems, and all Python features built into the language. Quite a number of people use it to write the C/C++ code they need but without the complex and error prone syntax.
Now, all of these use cases are covered by the same programming language. From a Cython perspective, there is no difference between them.
Thus, I don't see a reasonable place to make a cut to reach a "minimal" tool. Especially not one that hasn't been written yet. I mean, random people start writing a new "Python compiler" every couple of months [1], eventually notice that it's fun to start but not as easy as they thought, that they have neither the time nor the resources to complete it and/or turn it into something that gives a benefit to its potential users, and then abandon the project more or less.
I understand that all of that is, you know, open-source and scratching your own itch and all that, but I would be very glad if there was a way to channel these resources better, specifically into the tools that have achieved giving that benefit to a large user base.
Stefan
[1] The graveyard of "Python implementations", probably incomplete, is not a small one: https://wiki.python.org/moin/PythonImplementations
On Jul 31, 2018, at 23:23, Stefan Behnel <python_capi@behnel.de> wrote:
Just to make that point clearer, people commonly use Cython for three different main use cases:
a) Wrap native libraries for CPython (and less commonly for PyPy) with a pythonic interface. The cool feature there is that you can write python(-like) code that compiles down into the C layer and runs there, i.e. you can trivially make the wrapper as thin or thick as you want, without sacrificing performance. (This is pretty much the FFI case.)
b) Speed up Python code by compiling and optimising it using C data types. Here, Cython supports a very smooth transition from Python semantics and features down to low-level C speed and semantics, while still writing Python code (or very Python-like code, if you want) all along the way. You can even use Python type annotations for that these days.
c) Write C/C++ code without having to write C code. From the perspective of someone who has to write native code, statically typed Cython code compiles to the expected C code (when disabling the safety belts), but has a much nicer syntax, native access to the complete (C)Python and C ecosystems, and all Python features built into the language. Quite a number of people use it to write the C/C++ code they need but without the complex and error prone syntax.
Thanks for this write-up Stefan. FWIW, although all of these uses cases are important ones which Cython fulfills nicely, none specifically touch on the reason why I think we want a Cython-like tool in the stdlib. c) probably gets the closest though.
The use case for such a tool built into Python is this:
Victor has a goal/plan for improving the performance of CPython by 2x, and there have been numerous attempts at removing the GIL (gilectomy), swapping reference counting for a gc, reducing the use of ABI-locking macros, etc. Most of these get thwarted by the realization that Python's C API will have to change, very likely in a backward incompatible way. And there is a *ton* of C extensions out there.
So let’s say that we decide these are important and achievable goals, but the API breakage necessitates a version bump to Python 4, let’s say in 5 years. The question is: what is the migration plan so that we can minimize the disruption to the extension module ecosystem?
One approach would be to discourage extension module authors from writing extensions directly against the C API, but providing them with the power they need to call into the C API, and marry that with third party C libraries of all ilk, in a higher level language (type annotated Python?) along with a code generator that has the intimate knowledge of the C API. If we get significant buy-in, we can think about large-scale evolution of the C API in sync with evolving the code generation tool. That would be the ideal solution: source code compatibility across C API and runtime changes.
As far as adopting Cython vs. writing a lightweight tool, I think if we were to go down this path, we’d need something that comes with CPython by default. Thus the classic “the stdlib is where code goes to die” conundrum. There’s also the question of whether the Cython syntax is best suited for this use case, and whether the tool is more than we need to accomplish this goal. Even if we had a separate tool, that wouldn’t eliminate the need for Cython, since it still addresses the use cases you mention.
Cheers, -Barry
Barry Warsaw schrieb am 04.08.2018 um 00:23:
On Jul 31, 2018, at 23:23, Stefan Behnel wrote:
Just to make that point clearer, people commonly use Cython for three different main use cases:
a) Wrap native libraries for CPython (and less commonly for PyPy) with a pythonic interface. The cool feature there is that you can write python(-like) code that compiles down into the C layer and runs there, i.e. you can trivially make the wrapper as thin or thick as you want, without sacrificing performance. (This is pretty much the FFI case.)
b) Speed up Python code by compiling and optimising it using C data types. Here, Cython supports a very smooth transition from Python semantics and features down to low-level C speed and semantics, while still writing Python code (or very Python-like code, if you want) all along the way. You can even use Python type annotations for that these days.
c) Write C/C++ code without having to write C code. From the perspective of someone who has to write native code, statically typed Cython code compiles to the expected C code (when disabling the safety belts), but has a much nicer syntax, native access to the complete (C)Python and C ecosystems, and all Python features built into the language. Quite a number of people use it to write the C/C++ code they need but without the complex and error prone syntax.
Thanks for this write-up Stefan. FWIW, although all of these uses cases are important ones which Cython fulfills nicely, none specifically touch on the reason why I think we want a Cython-like tool in the stdlib. c) probably gets the closest though.
I would rather consider c) the least relevant reason. Those are the people who care about generating native code more than about interfacing it with Python. They could write C/C++ code (and then wrap that), it's just that they don't want to and prefer a simpler language that gives them the same performance.
This blog post here expresses it quite nicely:
https://explosion.ai/blog/writing-c-in-cython
What CPython wants to cater for is the first two groups of users, those who want to wrap native code for Python efficiently, and those who want to speed up their Python code by pushing their processing into C. My list of use cases above is (by chance) actually also ordered by the amount of C-API interaction. The wrapping case usually needs most interaction, for type conversions, calling, Python API creation, etc., followed by the speedup case which often involves some interaction on the way in, but then some substantial processing code that tries to avoid Python interaction to gain speed, and then more C-API interaction on the way back into Python. Obviously, using Cython to write C code (i.e. case c)) then specifically aims for having as little C-API interaction as possible.
The use case for such a tool built into Python is this:
Victor has a goal/plan for improving the performance of CPython by 2x, and there have been numerous attempts at removing the GIL (gilectomy), swapping reference counting for a gc, reducing the use of ABI-locking macros, etc. Most of these get thwarted by the realization that Python's C API will have to change, very likely in a backward incompatible way. And there is a *ton* of C extensions out there.
So let’s say that we decide these are important and achievable goals, but the API breakage necessitates a version bump to Python 4, let’s say in 5 years. The question is: what is the migration plan so that we can minimize the disruption to the extension module ecosystem?
One approach would be to discourage extension module authors from writing extensions directly against the C API, but providing them with the power they need to call into the C API, and marry that with third party C libraries of all ilk, in a higher level language (type annotated Python?) along with a code generator that has the intimate knowledge of the C API. If we get significant buy-in, we can think about large-scale evolution of the C API in sync with evolving the code generation tool. That would be the ideal solution: source code compatibility across C API and runtime changes.
As far as adopting Cython vs. writing a lightweight tool, I think if we were to go down this path, we’d need something that comes with CPython by default. Thus the classic “the stdlib is where code goes to die” conundrum. There’s also the question of whether the Cython syntax is best suited for this use case, and whether the tool is more than we need to accomplish this goal. Even if we had a separate tool, that wouldn’t eliminate the need for Cython, since it still addresses the use cases you mention.
Like Jeroen, I'm questioning your assumption that such a tool needs to be (or even should be) part of CPython. First of all, most Python users will not have a need for it (which, admittedly, could be said about many tools in the stdlib). But more importantly, Cython is not a runtime tool, it's a build-time code generator. Once you have a code generator in the stdlib, you're bound to the features it provides in a given Python version. And it will not generate code for you that works with the newest future Python version. Thus, in order to make use of it, you would always need the newest Python release with the newest code generator, which could then generate code that still works on older Python releases. Meaning, once a new Python version is released, with any substantial changes, the tools in older Python releases will become entirely useless. IMHO, a heavy argument against having it in the stdlib.
Alternatively, and that is probably what you are thinking of, the stdlib could contain a tool that only generates code for its specific Python version, without the need to care about backward or forward compatibility. That's probably nice from a CPython maintainer's perspective, but for developers, it means that they either need to push the usage of this tool down on their users, thus imposing a complete code generation and build step on them. That would make it really difficult for them to deal with a bug in the tool of a given CPython point release, and to make sure that the code works across all CPython point releases that users might want to use for the build process.
Or, let the developers use various different Python releases on their side to run their different code generator versions to create binary distributions (as they already do today). A source distribution could not contain any generated sources anymore, as they would not be portable. But then, what if the developers want to make use of newer features of the tool that are not available in older CPython releases yet that they still need to support? This is mostly just pushing the compatibility problem one layer up.
So, overall, I don't see how having such a tool in the stdlib would really improve the situation, but I can see lots of hints that it would get in the way of its users. It feels like it's bound to produce the same situation that the current distutils vs. setuptools world suffers from.
Stefan
On Tue, Jul 31, 2018 at 4:33 PM Jeroen Demeyer <J.Demeyer@ugent.be> wrote:
On 2018-07-31 18:42, Brett Cannon wrote:
Yes, what Eric is suggesting is a baseline tool like Cython be added to Python itself so it becomes the minimum, common tool that we point all extension authors to.
You didn't answer Victor's "stupid question": we can already point extension authors to Cython. Why do we need a new tool which will very likely have less features than the already-existing Cython?
You and Victor are right to ask why we need to change anything here. In my initial post I wanted to present a possibility for discussion while the context was fresh from the other threads. I agree that the justification is relatively weak. :) However, there are some points to consider.
Consider our three options here:
- do nothing (status quo); Cython remains a popular and and effective tool
- promote Cython as *the* tool (and maybe use Cython for stdlib extension modules)
- promote Cython as *the* high-level tool, add an official low-level tool to CPython (and maybe use the new tool for stdlib)
Arguably, in all three cases we should improve collaboration with the Cython project. Option #2 (and #3) would definitely require that. How can we do better in that regard?
In the interest of encouraging less use of the C-API, at least #2 would make a lot of sense. I expect it would require more/better collaboration between the Python core team and the Cython project.
Also, we should help Cython with technical aspects of the problem space, including both changes in CPython that will aid Cython and changes in Cython that will help users.
Finally, I suppose there's another option: factor out useful parts of Cython for inclusion in the stdlib.
=========================
FWIW, here's some thoughts on option #3 (basically my original post). Again, my goal was to introduce a possibility for discussion. At the least consider that there may some useful points here even if the overall idea isn't viable.
Cython is a great tool for the Python ecosystem. It is a large project developed over time, meaning that on the one hand it holds a lot of valuable knowledge and lessons learned. On the other hand, it probably doesn't look like it would under a green-field approach. :) Combine all that together and you might see the inspiration for an official tool based on Cython.
Effectively, the goal would be to factor out the low-level, minimal-ish core of Cython, if such a "core" is meaningful. The result would be a new stdlib library, as well as a simple tool in the CPython repo. The development of asyncio, relative to twisted, demonstrates the sort of effort I had in mind, though it's not a perfect parallel. :)
There are several advantages to the low-level tool/lib:
- we could use the tool for stdlib extension modules (if we want to), whereas we're reluctant to use a third-party tool like Cython currently
- the CPython test suite would include tests for the tool/lib, meaning changes that break the them would be blocked from merging, whereas Cython has to deal with such breaking changes after the fact
- if built on top of the low-level tool, Cython would be easier to maintain (presumably) since the project would have to do less and wouldn't have to adapt nearly as much to changes in CPython
- an opportunity to distill the concepts and functionality of Cython (like happened with asyncio/twisted)
I concede that there are a lot of assumptions there. :) Furthermore, there are serious downsides/obstacles:
- someone would have to do the work (always the biggest obstacle!)
- tools like Cython would have to make significant changes to take advantage of the new tool
- there's a high risk of regressions and other bugs in both the new tool and in Cython
Regardless, this would not make any sense unless Stefan (and the Cython project) were in favor of the new tool and heavily involved in its design (and implementation). Given the reaction thus far I don't see that happening. :)
then we will have to provide an FFI compiler for people to use in at least the simple cases.
Cython is *not* an FFI tool. It can be used for FFI but that's just one of its many use cases.
FWIW, there has been talk in the past of bringing CFFI (or something based on it) into the stdlib (in part as a replacement for ctypes). IIRC, it hasn't happened because no one was interested enough to do the work. If we ended up doing it, I think it would coexist with Cython (or something similar) just fine, as Stefan implied when he explained about different tools for different needs.
-eric
On Wed, Aug 1, 2018 at 11:01 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Tue, Jul 31, 2018 at 4:33 PM Jeroen Demeyer <J.Demeyer@ugent.be> wrote:
On 2018-07-31 18:42, Brett Cannon wrote:
Yes, what Eric is suggesting is a baseline tool like Cython be added to Python itself so it becomes the minimum, common tool that we point all extension authors to.
You didn't answer Victor's "stupid question": we can already point extension authors to Cython. Why do we need a new tool which will very likely have less features than the already-existing Cython?
You and Victor are right to ask why we need to change anything here. In my initial post I wanted to present a possibility for discussion while the context was fresh from the other threads. I agree that the justification is relatively weak. :) However, there are some points to consider.
Consider our three options here:
- do nothing (status quo); Cython remains a popular and and effective tool
- promote Cython as *the* tool (and maybe use Cython for stdlib extension modules)
- promote Cython as *the* high-level tool, add an official low-level tool to CPython (and maybe use the new tool for stdlib)
Arguably, in all three cases we should improve collaboration with the Cython project. Option #2 (and #3) would definitely require that. How can we do better in that regard?
In the interest of encouraging less use of the C-API, at least #2 would make a lot of sense. I expect it would require more/better collaboration between the Python core team and the Cython project.
+1. There has been some of this, e.g. PEP 509 and and the ongoing discussions with PEP 580. There could be more.
Also, we should help Cython with technical aspects of the problem space, including both changes in CPython that will aid Cython and changes in Cython that will help users.
Finally, I suppose there's another option: factor out useful parts of Cython for inclusion in the stdlib.
=========================
FWIW, here's some thoughts on option #3 (basically my original post). Again, my goal was to introduce a possibility for discussion. At the least consider that there may some useful points here even if the overall idea isn't viable.
Cython is a great tool for the Python ecosystem. It is a large project developed over time, meaning that on the one hand it holds a lot of valuable knowledge and lessons learned. On the other hand, it probably doesn't look like it would under a green-field approach. :) Combine all that together and you might see the inspiration for an official tool based on Cython.
Effectively, the goal would be to factor out the low-level, minimal-ish core of Cython, if such a "core" is meaningful.
Herein lies the primary difficulty with option #3. I have a hard time imagining what a low-level, minimal-ish "core" of Cython would be. What features would be omitted or simplifications would be made? The crux of Cython is compiling from a python(-like) syntax to C, end-to-end, which is the bulk of the project, and it's hard to "subset" this. In particular, we have prioritized adding features that require support in the language end-to-end (e.g. C++ support, from syntax to type checking to code generation) that is not easily built as a layer on top. (I suppose it's conceivable that one could make the syntax, type system, and code generation portions more modular and pluggable, but that's probably be a signifiant undertaking.)
In addition, I'm concerned that putting a barrier somewhere in the middle of the existing Cython project, with one half in third-party and the other part of CPython, would significantly harm development velocity and make adoption and distribution more difficult. (E.g. right now one gets new Cython features on already released versions of CPython; how would that story look if these features depended on upgrading the CPython portion of the library.) I also don't think users would be served by two entry points, each presenting it's own (Python-like, I'm assuming, unless the proposal is to develop a completely independent IR for human and machine consumption that spans Python and C) syntax, but with different extensions and restrictions.
It could be an interesting and illuminating exercise to try to define such a core, however. (And there certainly improvement/modernization we could bring to Cython itself.)
The result would be a new stdlib library, as well as a simple tool in the CPython repo. The development of asyncio, relative to twisted, demonstrates the sort of effort I had in mind, though it's not a perfect parallel. :)
There are several advantages to the low-level tool/lib:
- we could use the tool for stdlib extension modules (if we want to), whereas we're reluctant to use a third-party tool like Cython currently
- the CPython test suite would include tests for the tool/lib, meaning changes that break the them would be blocked from merging, whereas Cython has to deal with such breaking changes after the fact
- if built on top of the low-level tool, Cython would be easier to maintain (presumably) since the project would have to do less and wouldn't have to adapt nearly as much to changes in CPython
- an opportunity to distill the concepts and functionality of Cython (like happened with asyncio/twisted)
I concede that there are a lot of assumptions there. :) Furthermore, there are serious downsides/obstacles:
- someone would have to do the work (always the biggest obstacle!)
- tools like Cython would have to make significant changes to take advantage of the new tool
- there's a high risk of regressions and other bugs in both the new tool and in Cython
Regardless, this would not make any sense unless Stefan (and the Cython project) were in favor of the new tool and heavily involved in its design (and implementation). Given the reaction thus far I don't see that happening. :)
then we will have to provide an FFI compiler for people to use in at least the simple cases.
Cython is *not* an FFI tool. It can be used for FFI but that's just one of its many use cases.
FWIW, there has been talk in the past of bringing CFFI (or something based on it) into the stdlib (in part as a replacement for ctypes). IIRC, it hasn't happened because no one was interested enough to do the work. If we ended up doing it, I think it would coexist with Cython (or something similar) just fine, as Stefan implied when he explained about different tools for different needs.
-eric
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org
On Wed, 1 Aug 2018 at 12:08 Robert Bradshaw <robertwb@math.washington.edu> wrote:
On Wed, Aug 1, 2018 at 11:01 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Tue, Jul 31, 2018 at 4:33 PM Jeroen Demeyer <J.Demeyer@ugent.be> wrote:
On 2018-07-31 18:42, Brett Cannon wrote:
Yes, what Eric is suggesting is a baseline tool like Cython be added to Python itself so it becomes the minimum, common tool that we point all extension authors to.
You didn't answer Victor's "stupid question": we can already point extension authors to Cython. Why do we need a new tool which will very likely have less features than the already-existing Cython?
You and Victor are right to ask why we need to change anything here. In my initial post I wanted to present a possibility for discussion while the context was fresh from the other threads. I agree that the justification is relatively weak. :) However, there are some points to consider.
Consider our three options here:
- do nothing (status quo); Cython remains a popular and and effective tool
- promote Cython as *the* tool (and maybe use Cython for stdlib extension modules)
- promote Cython as *the* high-level tool, add an official low-level tool to CPython (and maybe use the new tool for stdlib)
Arguably, in all three cases we should improve collaboration with the Cython project. Option #2 (and #3) would definitely require that. How can we do better in that regard?
In the interest of encouraging less use of the C-API, at least #2 would make a lot of sense. I expect it would require more/better collaboration between the Python core team and the Cython project.
+1. There has been some of this, e.g. PEP 509 and and the ongoing discussions with PEP 580. There could be more.
Also, we should help Cython with technical aspects of the problem space, including both changes in CPython that will aid Cython and changes in Cython that will help users.
Finally, I suppose there's another option: factor out useful parts of Cython for inclusion in the stdlib.
=========================
FWIW, here's some thoughts on option #3 (basically my original post). Again, my goal was to introduce a possibility for discussion. At the least consider that there may some useful points here even if the overall idea isn't viable.
Cython is a great tool for the Python ecosystem. It is a large project developed over time, meaning that on the one hand it holds a lot of valuable knowledge and lessons learned. On the other hand, it probably doesn't look like it would under a green-field approach. :) Combine all that together and you might see the inspiration for an official tool based on Cython.
Effectively, the goal would be to factor out the low-level, minimal-ish core of Cython, if such a "core" is meaningful.
Herein lies the primary difficulty with option #3. I have a hard time imagining what a low-level, minimal-ish "core" of Cython would be. What features would be omitted or simplifications would be made? The crux of Cython is compiling from a python(-like) syntax to C, end-to-end, which is the bulk of the project, and it's hard to "subset" this. In particular, we have prioritized adding features that require support in the language end-to-end (e.g. C++ support, from syntax to type checking to code generation) that is not easily built as a layer on top. (I suppose it's conceivable that one could make the syntax, type system, and code generation portions more modular and pluggable, but that's probably be a signifiant undertaking.)
In addition, I'm concerned that putting a barrier somewhere in the middle of the existing Cython project, with one half in third-party and the other part of CPython, would significantly harm development velocity and make adoption and distribution more difficult. (E.g. right now one gets new Cython features on already released versions of CPython; how would that story look if these features depended on upgrading the CPython portion of the library.) I also don't think users would be served by two entry points, each presenting it's own (Python-like, I'm assuming, unless the proposal is to develop a completely independent IR for human and machine consumption that spans Python and C) syntax, but with different extensions and restrictions.
There's also the flip side that Cython has to chase after the C API rather than being considered a part of it. Stefan and the Cython team do an admirable job of keeping up (and we all try to keep them in the loop when we think something might affect them), but if the C API was directly used by something like Cython that was in the stdlib then it would potentially help keep everything in sync with less effort.
But that's obviously just speculation on my part and who knows if anyone would have the time to even attempt this idea. :)
-Brett
It could be an interesting and illuminating exercise to try to define such a core, however. (And there certainly improvement/modernization we could bring to Cython itself.)
The result would be a new stdlib library, as well as a simple tool in the CPython repo. The development of asyncio, relative to twisted, demonstrates the sort of effort I had in mind, though it's not a perfect parallel. :)
There are several advantages to the low-level tool/lib:
- we could use the tool for stdlib extension modules (if we want to), whereas we're reluctant to use a third-party tool like Cython currently
- the CPython test suite would include tests for the tool/lib, meaning changes that break the them would be blocked from merging, whereas Cython has to deal with such breaking changes after the fact
- if built on top of the low-level tool, Cython would be easier to maintain (presumably) since the project would have to do less and wouldn't have to adapt nearly as much to changes in CPython
- an opportunity to distill the concepts and functionality of Cython (like happened with asyncio/twisted)
I concede that there are a lot of assumptions there. :) Furthermore, there are serious downsides/obstacles:
- someone would have to do the work (always the biggest obstacle!)
- tools like Cython would have to make significant changes to take advantage of the new tool
- there's a high risk of regressions and other bugs in both the new tool and in Cython
Regardless, this would not make any sense unless Stefan (and the Cython project) were in favor of the new tool and heavily involved in its design (and implementation). Given the reaction thus far I don't see that happening. :)
then we will have to provide an FFI compiler for people to use in at least the simple cases.
Cython is *not* an FFI tool. It can be used for FFI but that's just one of its many use cases.
FWIW, there has been talk in the past of bringing CFFI (or something based on it) into the stdlib (in part as a replacement for ctypes). IIRC, it hasn't happened because no one was interested enough to do the work. If we ended up doing it, I think it would coexist with Cython (or something similar) just fine, as Stefan implied when he explained about different tools for different needs.
-eric
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org
participants (6)
-
Barry Warsaw
-
Brett Cannon
-
Eric Snow
-
Jeroen Demeyer
-
Robert Bradshaw
-
Stefan Behnel