Mailman 3 Please update Cython *before* introcuding C API incompatible changes in Python - Python-Dev

newer
PEP 677 (Callable Type Syntax):...

Please update Cython before introcuding C API incompatible changes in Python

older
Suggestion: a little language for...

Victor Stinner

1 Feb 2022 1 Feb '22

3:08 p.m.

Hi, It became more and more common that a C API incompatible change introduced in Python breaks Cython and all Python projects using Cython (ex: numpy). Hopefully, usually only some projects using Cython are broken, not all of them. Some of you may remind the PEP 590 (vectorcall) implementation which removed the tp_print member of PyTypeObject in Python 3.8. This change broke a large number of projects using Cython. Cython was updated, the member removal was reverted, etc. It took a few weeks to solve the issue. It was unpleasant to have to revert the change (add again tp_print). The tp_print member was removed again in Python 3.9. IMO the PEP 570 (positional-only arguments) implementation went even worse when it added a parameter to PyCode_New(). Cython was modified and released with a change, and *then* the PyCode_New() change was reverted in Python (if I recall correctly)! It was very confusing. There were multiple communication issues between Python and Cython. The first Cython fix was incorrect, etc. Last December, another Python change related to exceptions (bpo-45711) broke Cython on purpose. A commit message says "Add to what's new, because this change breaks things like Cython". It's kind of annoying that the each time we have a long period of time (several weeks) when Cython is unusable and there is pressure on Cython get a fix and then release a new version. At Red Hat, we are rebuilding Python frequently with an up to date Python 3.11: right now, numpy fails to build because of that. -- I would prefer to introduce C API incompatible changes differently: first fix Cython, and *then* introduce the change. - (1) Propose a Cython PR and get it merged - (2) Wait until a new Cython version is released - (3) If possible, wait until numpy is released with regenerated Cython code - (4) Introduce the incompatible change in Python Note: Fedora doesn't need (3) since we always regenerated Cython code in numpy. None of these C API incompatible changes are wrong. For each change, there are good reasons to introduce them. I'm only proposing to change *how* we introduce them to avoid a long period of time when Cython and/or numpy are not usable on the development version of Python. -- If you want to test if a C API change is going to break Cython or numpy, you can try my tool: https://github.com/vstinner/pythonci/ Usage: build a modified Python with your change, and then run pythonci with it. pythonci uses pinned version of all dependencies to try to be more reproducible. I update them manually infrequently (see pythonci/requirements.txt). -- Right now, I propose to revert the incompatible change related to exceptions until Cython is prepared for it: https://bugs.python.org/issue45711#msg412264 Victor -- Night gathers, and now my watch begins. It shall not end until my death.

Show replies by date

Christian Heimes

1 Feb 1 Feb

3:42 p.m.

On 01/02/2022 16.08, Victor Stinner wrote:

...

--

I would prefer to introduce C API incompatible changes differently: first fix Cython, and *then* introduce the change.

- (1) Propose a Cython PR and get it merged - (2) Wait until a new Cython version is released - (3) If possible, wait until numpy is released with regenerated Cython code - (4) Introduce the incompatible change in Python

Note: Fedora doesn't need (3) since we always regenerated Cython code in numpy.

Hi, this is a reasonable request for beta releases, but IMHO it is not feasible for alphas. During alphas we want to innovate fast and play around. Your proposal would slow down innovation and impose additional burden on core developers. There are more code binding generators than just Cython. Shouldn't we work with cffi, SWIG, sip, pybind11, and PyO3 developers as well? I care for cffi and PyO3, too... I would prefer if we can get Cython and all the other code generator and bindings library off the unstable C-API. They should use the limited API instead. If they require any C-APIs outside the limited API, then we should investigate and figure something out. Christian

Irit Katriel

4:36 p.m.

_PyErr_StackItem is not part of the C API, it's an internal struct that cython accesses directly. On Tue, Feb 1, 2022 at 3:42 PM Christian Heimes wrote:

...

On 01/02/2022 16.08, Victor Stinner wrote:

...
--

I would prefer to introduce C API incompatible changes differently: first fix Cython, and *then* introduce the change.

- (1) Propose a Cython PR and get it merged - (2) Wait until a new Cython version is released - (3) If possible, wait until numpy is released with regenerated Cython code - (4) Introduce the incompatible change in Python

Note: Fedora doesn't need (3) since we always regenerated Cython code in numpy.

Hi,

this is a reasonable request for beta releases, but IMHO it is not feasible for alphas. During alphas we want to innovate fast and play around. Your proposal would slow down innovation and impose additional burden on core developers.

There are more code binding generators than just Cython. Shouldn't we work with cffi, SWIG, sip, pybind11, and PyO3 developers as well? I care for cffi and PyO3, too...

I would prefer if we can get Cython and all the other code generator and bindings library off the unstable C-API. They should use the limited API instead. If they require any C-APIs outside the limited API, then we should investigate and figure something out.

Christian

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7UD4FZC7... Code of Conduct: http://python.org/psf/codeofconduct/

Victor Stinner

4:47 p.m.

On Tue, Feb 1, 2022 at 5:37 PM Irit Katriel wrote:

...

_PyErr_StackItem is not part of the C API, it's an internal struct that cython accesses directly.

numpy currently fails on building Cython __Pyx_PyErr_GetTopmostException() function which access tstate->exc_info->exc_type, so it's about the PyThreadState structure. We can debate if PyThreadState is considered as "public", "private" or "internal". At the end of the day, the thing is that building numpy on Python 3.11 currently fails with a compiler error. The Python C API documentation currently promotes the Cython usage: https://docs.python.org/dev/extending/#recommended-third-party-tools The fact is that Cython uses Python internals is known and there is an on-going effect to move away Cython from Python internals. Victor

Victor Stinner

4:42 p.m.

On Tue, Feb 1, 2022 at 4:42 PM Christian Heimes wrote:

...

I would prefer if we can get Cython and all the other code generator and bindings library off the unstable C-API. They should use the limited API instead. If they require any C-APIs outside the limited API, then we should investigate and figure something out.

I think that everybody (Python and Cython maintainers) agrees on this point, no? Mark Shannon created the issue "[C API] Add explicit support for Cython to the C API" for that: https://bugs.python.org/issue45247 Cython has an experimental support for the limited C API. The problem is that Cython has a long history of using Python internals on purpose to reach best performance. Moreover, on some cases, there was simply no other choice than using directly the Python internals. There is an on-going effort adding getter and setter functions on two structures which are causing most troubles on Python updates: * PyThreadState: https://bugs.python.org/issue39947 * PyFrameObject: https://bugs.python.org/issue40421 The goal is to make these two structures opaque as PyInterpreterState (made opaque in Python 3.7). The blocker issue is Cython. I'm one of the volunteer doing this work, but it's a long task, since these structures have many members. It takes time to properly design functions, merge them in Python, modify Cython to use them, get a Cython release, wait packages releases using the new Cython, etc. -- The problem right now is the pressure put on Cython maintainers to fix Cython as soon as possible. IMO core developers who introduce incompatible changes should be more involved in the Cython changes, since Cython is a **key component** of the Python ecosystem. IMO knowing that a change breaks Cython and relying on "the community" to fix it is not a nice move. Well, that's my opinion ;-) -- sip is different because it seems like it's only used by PyQt5, and PyQt5 is one of the few projects (with crytography?) using the limited C API ;-) About SWIG, pybind11, and PyO3, I don't recall seeing any complain on the Python bug tracker about an incompatible change which broke them. I don't doubt that it's the case, but they seem "less critical" for the most popular PyPI projects. It's good to fix them, but it can be done "later", no? Victor -- Night gathers, and now my watch begins. It shall not end until my death.

Miro Hrončok

6:47 p.m.

On 01. 02. 22 17:42, Victor Stinner wrote:

...

The problem right now is the pressure put on Cython maintainers to fix Cython as soon as possible. IMO core developers who introduce incompatible changes should be more involved in the Cython changes, since Cython is a **key component** of the Python ecosystem. IMO knowing that a change breaks Cython and relying on "the community" to fix it is not a nice move. Well, that's my opinion;-)

As the Fedora Python maintainer, I agree with this opinion. Broken Cython means we cannot actually test the next pre-release of CPython until it is fixed. And the CPython contributors who introduced the chnage are the most equipped ones to help fix it. I understand the desire to innovate fast, but making sure Cython works should be an essential part of the innovation process (even while Cython is not part of the CPython source tree, it's part of the bigger picture). -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok

Irit Katriel

7:17 p.m.

Miro, I have offered before and my offer still stands to help fix this. This was already fixed in the cython main branch by Stefan. The discussion now is about when to backport it to cython 0.29. I'm actually working on the backport now (learning cython in the process). But we will need to come up with a release plan that doesn't make me revert the cpython changes until after the 3.11 beta is released, because that would mean that I can only make them in 3.12. On Tue, Feb 1, 2022 at 6:53 PM Miro Hrončok wrote:

...

On 01. 02. 22 17:42, Victor Stinner wrote:

...
The problem right now is the pressure put on Cython maintainers to fix Cython as soon as possible. IMO core developers who introduce incompatible changes should be more involved in the Cython changes, since Cython is a **key component** of the Python ecosystem. IMO knowing that a change breaks Cython and relying on "the community" to fix it is not a nice move. Well, that's my opinion;-)

As the Fedora Python maintainer, I agree with this opinion. Broken Cython means we cannot actually test the next pre-release of CPython until it is fixed. And the CPython contributors who introduced the chnage are the most equipped ones to help fix it.

I understand the desire to innovate fast, but making sure Cython works should be an essential part of the innovation process (even while Cython is not part of the CPython source tree, it's part of the bigger picture).

-- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/K7LZAJTG... Code of Conduct: http://python.org/psf/codeofconduct/

Miro Hrončok

7:24 p.m.

On 01. 02. 22 20:17, Irit Katriel wrote:

...

Miro,

I have offered before and my offer still stands to help fix this.

Thank You!

...

This was already fixed in the cython main branch by Stefan. The discussion now is about when to backport it to cython 0.29.

I'm actually working on the backport now (learning cython in the process). But we will need to come up with a release plan that doesn't make me revert the cpython changes until after the 3.11 beta is released, because that would mean that I can only make them in 3.12.

My comment was to the general discussion about how changes are done, not about this one in particular. Sorry if that was not clear. -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok

Guido van Rossum

7:48 p.m.

It seems to me that a big part of the problem is that Cython feels entitled to use arbitrary CPython internals. Another part is that there doesn't seem to be any Cython maintainer interested in communicating with the core devs who are changing those CPython internals. We have to resort to creating issues in the Cython tracker and wait weeks before someone responds, and often not in a supportive way. I am fully aware that Cython is an important tool in our ecosystem. But before I agree that we should roll back things "because it breaks Cython" I would like to see a lot more participation in CPython's development by Cython developers. (Who are they even? I only know "scoder" -- who else can speak authoritatively on behalf of Cython?) During a beta cycle I would see the roles reversed. But until late May we are still working on alphas. If Cython wants to wait until beta 1 that's fine, but then their input on the design of CPython changes is necessarily much more limited. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...

Greg Ewing

10:33 p.m.

On 2/02/22 8:48 am, Guido van Rossum wrote:

...

It seems to me that a big part of the problem is that Cython feels entitled to use arbitrary CPython internals.

I think the reason for this is that Cython is trying to be two things at once: (1) an interface between Python and C, (2) a compiler that turns Python code into fast C code. To address this there could be an option to choose between "compatible code" and "fast code", with the former restricting itself to the stable API. -- Greg

Christopher Barker

10:53 p.m.

On Tue, Feb 1, 2022 at 2:36 PM Greg Ewing wrote:

...

I think the reason for this is that Cython is trying to be two things at once: (1) an interface between Python and C, (2) a compiler that turns Python code into fast C code.

As a long time Cython user, but not a Cython developer, I think (2) is the primary purpose, with (1) as a handy side benefit (otherwise we'd just use ctypes, yes?) That being said a not-quite-as-fast-as-possible mode would be fine. Though I'm not sure it would buy much, as long as projects (including major ones like numpy) are using as-fast-as-possible mode. As long as I'm being wishy-washy: maybe we don't need as-fast-as-possible at all, and can get to fast-enough-that-you-won't-notice. e.g. if the stable API is missing a feature needed for important performance reasons, then it could be extended, rather than forcing projects that use it to suffer significantly performance-wise. It seems the Cython/numpy use-case is a good way to test the limits of performance. -CHB

...

To address this there could be an option to choose between "compatible code" and "fast code", with the former restricting itself to the stable API.

-- Greg

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EACB7ZZV... Code of Conduct: http://python.org/psf/codeofconduct/

-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Greg Ewing

11:22 p.m.

On 2/02/22 11:53 am, Christopher Barker wrote:

...

As a long time Cython user, but not a Cython developer, I think (2) is the primary purpose, with (1) as a handy side benefit (otherwise we'd just use ctypes, yes?)

Personally, no, I would not "just use ctypes". The main reason I created Pyrex was to avoid the extreme amounts of pain involved in doing things like that.

...

That being said a not-quite-as-fast-as-possible mode would be fine.

Though I'm not sure it would buy much, as long as projects (including major ones like numpy) are using as-fast-as-possible mode.

That's why I think compatible mode should be the default. Then those who choose otherwise will be aware of what they are doing. It would also mean that CPython developers needn't have as many qualms about changing internals. Cython users would be in the same position as someone writing an extension module by hand and choosing whether to stick to the stable API or not. If they rely on internals, they do so at their own risk. -- Greg

Christopher Barker

11:32 p.m.

On Tue, Feb 1, 2022 at 3:22 PM Greg Ewing wrote:

...

On 2/02/22 11:53 am, Christopher Barker wrote:

...
As a long time Cython user, but not a Cython developer, I think (2) is the primary purpose, with (1) as a handy side benefit (otherwise we'd just use ctypes, yes?)

Personally, no, I would not "just use ctypes". The main reason I created Pyrex was to avoid the extreme amounts of pain involved in doing things like that.

And thanks for that! I too find ctypes painful. But the other reason I use Cython (and Pyrex before it) even when I need to wrap C code, is that I can make a "thick" high performance wrapper, e.g. if I want to call an expensive C function on each item in a sequence, I can do that in Cython, removing a lot of the overhead of Python. Anyway, I don't think there's any disagreement that high-performing Cython code is an important use case.

...

That being said a not-quite-as-fast-as-possible mode would be fine.

...
Though I'm not sure it would buy much, as long as projects (including major ones like numpy) are using as-fast-as-possible mode.

That's why I think compatible mode should be the default. Then those who choose otherwise will be aware of what they are doing.

Sure, but this thread is not just about users like me, that can choose the more stable way or the faster way, but specifically about numpy, which is going to use the fast way -- and we don't want to break that any more than absolutely necessary. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Greg Ewing

2 Feb 2 Feb

12:38 a.m.

On 2/02/22 12:32 pm, Christopher Barker wrote:

...

I can make a "thick" high performance wrapper, e.g. if I want to call an expensive C function on each item in a sequence, I can do that in Cython, removing a lot of the overhead of Python.

"Not as fast as possible" doesn't necessarily mean *slow*. Even using the stable ABI, the code you get will still be a lot more efficient than a Python wrapper.

...

Sure, but this thread is not just about users like me, that can choose the more stable way or the faster way, but specifically about numpy, which is going to use the fast way -- and we don't want to break that any more than absolutely necessary.

I'm skeptical about how much difference it would actually make. Numpy gets its speed from tight loops in C and calling out to C and Fortran libraries. None of that is affected by which CPython API is being used. In any case, if numpy explicitly chooses speed over compatibility, that's an issue between CPython and numpy, not CPython and Cython. -- Greg

dw-git＠d-woods.co.uk

1 Feb 1 Feb

11:04 p.m.

Greg Ewing wrote:

...

To address this there could be an option to choose between "compatible code" and "fast code", with the former restricting itself to the stable API.

To some extent, that exists at the moment - many of the real abuses of the CPython internals can be controlled by setting C defines. For the particular feature that caused this discussion the majority of the uses can be turned off by defining CYTHON_USE_EXC_INFO_STACK=0 and CYTHON_FAST_THREAD_STATE=0. (There's still a few uses relating to coroutines, but those too flags are sufficient to get Cython to build itself and Numpy on Python 3.11a4). Obviously it could still be better. But the desire to support PyPy (and the beginnings of the limited API) mean that Cython does actually have alternate "clean" code-paths for a lot of cases.

Guido van Rossum

11:21 p.m.

Hm... So maybe the issue is either with Cython's default settings (perhaps traditionally it defaults to "as fast as possible but relies on internal APIs a lot"?) or with the Cython settings selected by default by projects *using* Cython? I wonder if a solution during CPython's rocky alpha release cycle could be to default (either in Cython or in projects using it) to the "not quite as fast but not relying on a lot of internal APIs" mode, and to switch to Cython's faster mode only once (a) beta is entered and (b) Cython has been fixed to work with that beta? Sure, occasionally things still change during beta, but the point of beta is that things shouldn't change unless it is to fix bugs. On behalf of the Faster CPython project I can commit to that for our contributions, we'll do our advanced work on the 3.12 branch once 3.11beta has started. All this is assuming that Cython's default can be adjusted independently for CPython's upcoming release (3.11, for now) and separately for previous releases (3.10 and before). But if it can't yet, surely *that* would be a relatively simple change? On Tue, Feb 1, 2022 at 3:07 PM wrote:

...

Greg Ewing wrote:

...
To address this there could be an option to choose between "compatible code" and "fast code", with the former restricting itself to the stable API.

To some extent, that exists at the moment - many of the real abuses of the CPython internals can be controlled by setting C defines. For the particular feature that caused this discussion the majority of the uses can be turned off by defining CYTHON_USE_EXC_INFO_STACK=0 and CYTHON_FAST_THREAD_STATE=0. (There's still a few uses relating to coroutines, but those too flags are sufficient to get Cython to build itself and Numpy on Python 3.11a4).

Obviously it could still be better. But the desire to support PyPy (and the beginnings of the limited API) mean that Cython does actually have alternate "clean" code-paths for a lot of cases. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Q3IQUKU3... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...

Stefan Behnel

2 Feb 2 Feb

12:05 a.m.

Guido van Rossum schrieb am 02.02.22 um 00:21:

...

On Tue, Feb 1, 2022 at 3:07 David wrote:

...
Greg Ewing wrote:

...
To address this there could be an option to choose between "compatible code" and "fast code", with the former restricting itself to the stable API.

To some extent, that exists at the moment - many of the real abuses of the CPython internals can be controlled by setting C defines. For the particular feature that caused this discussion the majority of the uses can be turned off by defining CYTHON_USE_EXC_INFO_STACK=0 and CYTHON_FAST_THREAD_STATE=0. (There's still a few uses relating to coroutines, but those too flags are sufficient to get Cython to build itself and Numpy on Python 3.11a4).

Obviously it could still be better. But the desire to support PyPy (and the beginnings of the limited API) mean that Cython does actually have alternate "clean" code-paths for a lot of cases.

Hm... So maybe the issue is either with Cython's default settings (perhaps traditionally it defaults to "as fast as possible but relies on internal APIs a lot"?) or with the Cython settings selected by default by projects *using* Cython?

I wonder if a solution during CPython's rocky alpha release cycle could be to default (either in Cython or in projects using it) to the "not quite as fast but not relying on a lot of internal APIs" mode, and to switch to Cython's faster mode only once (a) beta is entered and (b) Cython has been fixed to work with that beta?

This seems tempting – with the drawback that it would make Cython modules less comparable between final and alpha/beta CPython releases. So users would start reporting ghost performance regressions because it (understandably) feels important to them that the slow-down they witness needs to be resolved before the final release, and they just won't know that this will happen automatically triggered by the version switch. :) Feels a bit like car manufacturers who switch their exhaust cleaners on and off based on the test mode detection. More importantly, though, we'd get less bug reports during the alpha/beta cycle ourselves, because things may look like they work but can still stop working when we switch back to fast mode. I'd rather make it more obvious to users what their intentions are. And there is already a way to do that – the Limited API. (and similarly, HPy) For Cython, support for the Limited API is still work in progress, although many things are in place already. Getting it to work completely would give users a simple way to decide whether they want to opt in for a) speed, lots of wheels and adaptations for each CPython version, or b) less performance, less hassle. As it looks now, that switch can be done after the code generation, by defining a simple C define in their build script. That also makes both modes easily comparable. I think that is as good as it can get. Stefan

Guido van Rossum

12:43 a.m.

On Tue, Feb 1, 2022 at 4:14 PM Stefan Behnel wrote:

...

Guido van Rossum schrieb am 02.02.22 um 00:21:

...
I wonder if a solution during CPython's rocky alpha release cycle could be to default (either in Cython or in projects using it) to the "not quite as fast but not relying on a lot of internal APIs" mode, and to switch to Cython's faster mode only once (a) beta is entered and (b) Cython has been fixed to work with that beta?

This seems tempting – with the drawback that it would make Cython modules less comparable between final and alpha/beta CPython releases. So users would start reporting ghost performance regressions because it (understandably) feels important to them that the slow-down they witness needs to be resolved before the final release, and they just won't know that this will happen automatically triggered by the version switch. :)

It sounds like you are speaking from experience here, so I won't argue.

...

Feels a bit like car manufacturers who switch their exhaust cleaners on and off based on the test mode detection.

That would be more like detecting benchmarks and doing something different. In terms of the car manufacturing process, we're talking about testing next year's model before production has started up yet. If the new model uses more gas than last year's, that would be a problem that needs to be solved before production starts, but what we seem to have with Cython is more like the new model's doors don't open. :-) It may be hard to imagine if you're working on Cython, which only exists because of performance needs, but there are other things that people want to test with the upcoming CPython release in addition to performance (are the seats comfortable? do the controls for the moonroof work?), and given the long dependency chains in modern apps and packages, people want to get started on those things early. Until numpy builds with Cython for CPython 3.11, nobody can start testing scikit-learn with CPython 3.11, and that's frustrating for the scikit-learn maintainers.

...

More importantly, though, we'd get less bug reports during the alpha/beta cycle ourselves, because things may look like they work but can still stop working when we switch back to fast mode.

True, true. Nobody's perfect.

...

I'd rather make it more obvious to users what their intentions are. And there is already a way to do that – the Limited API. (and similarly, HPy)

Your grammar confuses me. Do you want users to be clearer in expressing their intentions?

...

For Cython, support for the Limited API is still work in progress, although many things are in place already. Getting it to work completely would give users a simple way to decide whether they want to opt in for a) speed, lots of wheels and adaptations for each CPython version, or b) less performance, less hassle.

But until that work is complete, we're stuck with the unlimited API, right? And by its own statements in a recent post here, HPy is still not ready for all use cases, so it's also still a pipe dream.

...

As it looks now, that switch can be done after the code generation, by defining a simple C define in their build script. That also makes both modes easily comparable. I think that is as good as it can get.

Do you have specific instructions for package developers here? I could imagine that the scikit-learn maintainer (sorry to pick on you guys :-) might not know where to start with this if until now they've always been able to rely on either numpy wheels or building everything from source with default settings. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...

Stefan Behnel

1:36 a.m.

Guido van Rossum schrieb am 02.02.22 um 01:43:

...

It may be hard to imagine if you're working on Cython, which only exists because of performance needs, but there are other things that people want to test with the upcoming CPython release in addition to performance

I know. Cython (and originally Pyrex) has come a long way from a tool to get stuff done to a dependency that a large number of packages depend on. Maintainer decisions these days are quite different from those 10 years ago. Let alone 20. Let's just try to keep things working in general, and fix stuff that needs to be broken.

...

On Tue, Feb 1, 2022 at 4:14 PM Stefan Behnel wrote:

...
I'd rather make it more obvious to users what their intentions are. And there is already a way to do that – the Limited API. (and similarly, HPy)

Your grammar confuses me. Do you want users to be clearer in expressing their intentions?

Erm, sort of. They should be able to choose and express what they prefer, in a simple way.

...

...
For Cython, support for the Limited API is still work in progress, although many things are in place already. Getting it to work completely would give users a simple way to decide whether they want to opt in for a) speed, lots of wheels and adaptations for each CPython version, or b) less performance, less hassle.

But until that work is complete, we're stuck with the unlimited API, right? And by its own statements in a recent post here, HPy is still not ready for all use cases, so it's also still a pipe dream.

Yes. HPy is certainly far from ready for anything real, but even for the Limited API, it's still unclear whether it's actually complete enough to cover Cython's needs. Basically, the API that Cython uses must really to be able to implement CPython on top of itself. And at the same time interact not with the reimplementation but with the underlying original, at the C level. The C-API, and especially the Limited API, were never really meant for that.

...

...
As it looks now, that switch can be done after the code generation, by defining a simple C define in their build script. That also makes both modes easily comparable. I think that is as good as it can get.

Do you have specific instructions for package developers here? I could imagine that the scikit-learn maintainer (sorry to pick on you guys :-) might not know where to start with this if until now they've always been able to rely on either numpy wheels or building everything from source with default settings.

It's not well documented yet, since the implementation isn't complete, and so, a bunch of things simply won't work. I don't remember if the buffer protocol is part of the Limited API by now, but last I checked it was still missing, so the scikit-learn (or NumPy) people would be fairly unhappy with the current state of affairs. But it's mostly just passing "-DCYTHON_LIMITED_API" to your C compiler. That's the part that will still work but won't do (yet) what you think. Because then, you currently also have to define "-DPy_LIMITED_API=..." and that's when your C compiler will get angry with you. Stefan

Guido van Rossum

2:56 a.m.

On Tue, Feb 1, 2022 at 5:52 PM Stefan Behnel wrote:

...

Guido van Rossum schrieb am 02.02.22 um 01:43: Yes. HPy is certainly far from ready for anything real, but even for the Limited API, it's still unclear whether it's actually complete enough to cover Cython's needs. Basically, the API that Cython uses must really to be able to implement CPython on top of itself. And at the same time interact not with the reimplementation but with the underlying original, at the C level. The C-API, and especially the Limited API, were never really meant for that.

Undoubtedly. My question for you is if you're willing to write up a list of things in CPython that you depend on. Or is this just something you're not willing to commit to? It would be nice to know which it is, just so the CPython team knows what we're up against. And if you just want to retain the freedom to use any and all CPython internals you can gain access to, maybe (a) Cython users should be told, so they can be prepared for the consequences, and (b) you should probably just #define the C preprocessor symbols that let you #include the truly internal headers so you can do what you want. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...

dw-git＠d-woods.co.uk

8:19 a.m.

Guido van Rossum wrote:

...

My question for you is if you're willing to write up a list of things in CPython that you depend on. Or is this just something you're not willing to commit to? It would be nice to know which it is, just so the CPython team knows what we're up against.

I'm happy to prepare a list of CPython internals that Cython uses. Obviously there's no guarantee that it'll be complete (just because it involves lots of manually finding things in code so is full of human error) or that it doesn't change in future. But a list would at least let everyone know where they stand.

Christian Heimes

8:29 a.m.

On 02/02/2022 09.19, dw-git@d-woods.co.uk wrote:

...

Guido van Rossum wrote:

...
My question for you is if you're willing to write up a list of things in CPython that you depend on. Or is this just something you're not willing to commit to? It would be nice to know which it is, just so the CPython team knows what we're up against.

I'm happy to prepare a list of CPython internals that Cython uses. Obviously there's no guarantee that it'll be complete (just because it involves lots of manually finding things in code so is full of human error) or that it doesn't change in future. But a list would at least let everyone know where they stand.

You should be able to automate much of the task and avoid human errors: 1) dump all exported symbols from libpython.so, e.g. readelf -Ws /usr/lib64/libpython3.10.so or readelf -Ws /usr/lib64/libpython3.10.so 2) look for each symbol in Cython sources Christian

Victor Stinner

10:42 a.m.

On Wed, Feb 2, 2022 at 9:25 AM wrote:

...

Guido van Rossum wrote:

...
My question for you is if you're willing to write up a list of things in CPython that you depend on. Or is this just something you're not willing to commit to? It would be nice to know which it is, just so the CPython team knows what we're up against.

I'm happy to prepare a list of CPython internals that Cython uses. Obviously there's no guarantee that it'll be complete (just because it involves lots of manually finding things in code so is full of human error) or that it doesn't change in future. But a list would at least let everyone know where they stand.

A subset of these issues are listed in these two issues: * PyThreadState: https://bugs.python.org/issue39947 * PyFrameObject: https://bugs.python.org/issue40421 By the way, I have a pending PR to add PyThreadState_SetTrace(tstate, trace_func) and PyThreadState_SetProfile(tstate, profile_func) to avoid modifying directly PyThreadState members directly: (c_tracefunc and c_traceobj) and (c_profilefunc and c_profileobj). The implementation is more complicated than what you would expect: it does Py_DECREF() which can trigger a GC collection, it requires to call _PyThreadState_ResumeTracing() which is non-trivial, etc. But it's unclear to me if debbugers and profilers *can* or *would like* to use such function which requires to hold the GIL and can execute arbitrary Python code (Py_DECREF which can trigger a GC collection). => PR: https://github.com/python/cpython/pull/29121 Victor -- Night gathers, and now my watch begins. It shall not end until my death.

Stefan Behnel

1 Feb 1 Feb

11:12 p.m.

Greg Ewing schrieb am 01.02.22 um 23:33:

...

On 2/02/22 8:48 am, Guido van Rossum wrote:

...
It seems to me that a big part of the problem is that Cython feels entitled to use arbitrary CPython internals.

I think the reason for this is that Cython is trying to be two things at once: (1) an interface between Python and C, (2) a compiler that turns Python code into fast C code.

To address this there could be an option to choose between "compatible code" and "fast code", with the former restricting itself to the stable API.

There is even more than such an option. We use a relatively large set of feature flags that allow us to turn the usage of certain implementation details of the C-API on and off. We use this to adapt to different Python C-API implementations (currently CPython, PyPy, GraalPython and the Limited C-API), although with different levels of support and reliability. Here's the complete list of feature sets for the different targets: https://github.com/cython/cython/blob/5a76c404c803601b6941525cb8ec8096ddb103... This can also be used to enable and disable certain dependencies on CPython implementation details, e.g. PyList, PyLong or PyUnicode, but also type specs versus PyTypeObject structs. Most of these feature flags can be disabled by users. There is no hard guarantee that this always works, because it's impossible to test all combinations, and then there are bugs as well, but most of the flags are independent, which should usually allow to disable them independently. So, one of the tools that we have in our sleeves when it comes to supporting new CPython versions is also to selectively disable the dependency on a certain C-API feature that changed, at least until we have a way to adapt to the change itself. In the specific case of the "exc_info" changes, however, that didn't quite work, because that change was really not anticipated at that level of impact. But there is an implementation for Cython 3.0 alpha now, and we'll eventually have a legacy 0.29.x release out that will also adapt in one way or another. Just takes a bit more time. Stefan

Greg Ewing

10:28 p.m.

On 2/02/22 5:42 am, Victor Stinner wrote:

...

There is an on-going effort adding getter and setter functions on two structures which are causing most troubles on Python updates:

* PyThreadState: https://bugs.python.org/issue39947 * PyFrameObject: https://bugs.python.org/issue40421

In the case of PyFrameObject, as far as I know the only reason Cython needs to mess with it is to get filename/lineno information into tracebacks. If that's still true, I think it would be better to make it possible to add that information directly to traceback objects so that fake frame objects are not needed. -- Greg

Stefan Behnel

8:05 p.m.

Christian Heimes schrieb am 01.02.22 um 16:42:

...

On 01/02/2022 16.08, Victor Stinner wrote:

...
I would prefer to introduce C API incompatible changes differently: first fix Cython, and *then* introduce the change.

- (1) Propose a Cython PR and get it merged - (2) Wait until a new Cython version is released - (3) If possible, wait until numpy is released with regenerated Cython code - (4) Introduce the incompatible change in Python

Note: Fedora doesn't need (3) since we always regenerated Cython code in numpy.

this is a reasonable request for beta releases, but IMHO it is not feasible for alphas. During alphas we want to innovate fast and play around. Your proposal would slow down innovation and impose additional burden on core developers.

Let's at least try not to run into a catch-22. I'm reluctant to working on adapting Cython during alphas, because it happened more than once that incompatible changes in CPython were rolled back or modified again during alpha, beta and rc phases. That means more work for me and the Cython project, and its users. Code that Cython users generate and release on their side with a release version of Cython will then be broken, and sometimes even more broken than with an older Cython release. But Victor is right, OTOH, that the longer we wait with adapting Cython, the longer users have to wait with testing their code in upcoming CPython versions, and the higher the chance of post-beta and post-rc rollbacks and changes in CPython. I don't have the capacity to follow all relevant changes in CPython, incompatible or not. Even a Cython CI breakage of the CPython-dev job doesn't always mean that there is something to do on our side and is therefore silenced to avoid breakage of our own project workflows, and to be looked at irregularly. Additionally, since Cython is a crucial part of the Python ecosystem, breakage of Cython by CPython sometimes stalls the build pipelines of CI images, which means that new CPython dev versions don't reach the CI servers for a while, during which the breakage will go even more unnoticed. I think you should generally appreciate Cython (and the few other C-API abstraction tools) as an opportunity to get a large number of extensions adapted to CPython's now faster development all at once. The quicker these tools adapt, the quicker you can get user feedback on your own changes, and the more time you have to validate and refine them during the alpha and beta cycles. You can even see the adaptation as a way to validate your own changes in the real world. It's cool to write new code, but difficult to find out whether it behaves the way you want for the intended audience. So – be part of your own audience. Stefan

Irit Katriel

10:04 p.m.

Stefan, There two separate issues here. One is the timing of committing changes into cython, and the other is the process by which the cython devs learn about cpython development. On the first issue, you wrote: I'm reluctant to working on adapting Cython during alphas, because it

...

happened more than once that incompatible changes in CPython were rolled back or modified again during alpha, beta and rc phases. That means more work for me and the Cython project, and its users. Code that Cython users generate and release on their side with a release version of Cython will then be broken, and sometimes even more broken than with an older Cython release.

I saw in your patch that you make changes such that they impact only the new cpython version. So for old versions the generated code should not be broken. Surely you don't guarantee that cython code generated for an alpha version of cpython will work on later versions as well? Users who generate code for an alpha version should regenerate it for the next alpha and for beta, right? On the second issue: I don't have the capacity to follow all relevant changes in CPython,

...

incompatible or not.

We get that, and this is why we're asking to work with you on cython updates so that this will be easier for all of us. There are a number of cpython core devs who would like to help cython maintenance. We realise how important and thinly resourced cython is, and we want to reduce your maintenance burden. With better communication we could find ways to do that. Returning to the issue that started this thread - how do you suggest we proceed with the exc_info change? Irit

Stefan Behnel

11:32 p.m.

Hi Irit, Irit Katriel via Python-Dev schrieb am 01.02.22 um 23:04:

...

There two separate issues here. One is the timing of committing changes into cython, and the other is the process by which the cython devs learn about cpython development.

On the first issue, you wrote:

I'm reluctant to working on adapting Cython during alphas, because it

...
happened more than once that incompatible changes in CPython were rolled back or modified again during alpha, beta and rc phases. That means more work for me and the Cython project, and its users. Code that Cython users generate and release on their side with a release version of Cython will then be broken, and sometimes even more broken than with an older Cython release.

I saw in your patch that you make changes such that they impact only the new cpython version. So for old versions the generated code should not be broken. Surely you don't guarantee that cython code generated for an alpha version of cpython will work on later versions as well? Users who generate code for an alpha version should regenerate it for the next alpha and for beta, right?

I'd just like to note that we are talking about three different projects and dependency levels here (CPython, Cython and a project that uses Cython), all three have different release cycles, and not all projects can afford to go through a new release with a new Cython version regularly or on the "emergency" event of a new CPython release. Some even don't provide wheels and require their users to do a source build on their side. Often with a fixed Cython version dependency, or even with pre-generated and shipped C sources, which makes it harder for the end users to upgrade Cython as a work-around. But at least it should be as easy for the maintainers as updating their Cython version and pushing a new release. In most cases. And things are also becoming easier these days with improvements in the packaging ecosystem. It can just take a bit until everyone has had the chance to upgrade along the food chain.

...

On the second issue:

...
I don't have the capacity to follow all relevant changes in CPython, incompatible or not.

We get that, and this is why we're asking to work with you on cython updates so that this will be easier for all of us. There are a number of cpython core devs who would like to help cython maintenance. We realise how important and thinly resourced cython is, and we want to reduce your maintenance burden. With better communication we could find ways to do that.

I'm sure we will. Thanks for your help. It is warmly appreciated.

...

Returning to the issue that started this thread - how do you suggest we proceed with the exc_info change?

I'm not done sorting out the options yet. Regarding CPython, I think it's best to keep the current changes in there. It should be easier for us to continue from where we are now than to adapt again to a revert in CPython. Stefan

Petr Viktorin

2 Feb 2 Feb

9:22 a.m.

On 01. 02. 22 16:42, Christian Heimes wrote:

...

On 01/02/2022 16.08, Victor Stinner wrote:

...
--

I would prefer to introduce C API incompatible changes differently: first fix Cython, and *then* introduce the change.

- (1) Propose a Cython PR and get it merged - (2) Wait until a new Cython version is released - (3) If possible, wait until numpy is released with regenerated Cython code - (4) Introduce the incompatible change in Python

Note: Fedora doesn't need (3) since we always regenerated Cython code in numpy.

Hi,

this is a reasonable request for beta releases, but IMHO it is not feasible for alphas. During alphas we want to innovate fast and play around. Your proposal would slow down innovation and impose additional burden on core developers.

There are more code binding generators than just Cython. Shouldn't we work with cffi, SWIG, sip, pybind11, and PyO3 developers as well? I care for cffi and PyO3, too...

I would prefer if we can get Cython and all the other code generator and bindings library off the unstable C-API. They should use the limited API instead. If they require any C-APIs outside the limited API, then we should investigate and figure something out.

Moving off the internal (unstable) API would be great, but I don't think Cython needs to move all the way to the limited API. There are three "levels" in the C API: - limited API, with long-term ABI compatibility guarantees - "normal" public API, covered by the backwards compatibility policy (users need to recompile for every minor release, and watch for deprecation warnings) - internal API (underscore-prefixed names, `internal` headers, things documented as private) AFAIK, only the last one is causing trouble here. (see also: https://devguide.python.org/c-api/)

Stefan Behnel

10:50 a.m.

Petr Viktorin schrieb am 02.02.22 um 10:22:

...

Moving off the internal (unstable) API would be great, but I don't think Cython needs to move all the way to the limited API. There are three "levels" in the C API:

- limited API, with long-term ABI compatibility guarantees

That's what "-DCYTHON_LIMITED_API -DPy_LIMITED_API=..." is supposed to do, which currently fails for much if not most code.

...

- "normal" public API, covered by the backwards compatibility policy (users need to recompile for every minor release, and watch for deprecation warnings)

That's probably close to what "-DCYTHON_LIMITED_API" does by itself as it stands. I can see that being a nice feature that just deserves a more suitable name. (The name was chosen because it was meant to also internally define "Py_LIMITED_API" at some point. Not sure if it will ever do that.)

...

- internal API (underscore-prefixed names, `internal` headers, things documented as private)

AFAIK, only the last one is causing trouble here.

Yeah, and that's the current default mode on CPython. Maybe we should advertise the two modes more. And make sure that both work. There are certainly issues with the current state of the "limited API" implementation, but that just needs work and testing. Stefan

Ronald Oussoren

3:44 p.m.

...

On 2 Feb 2022, at 11:50, Stefan Behnel wrote:

Petr Viktorin schrieb am 02.02.22 um 10:22:

...
Moving off the internal (unstable) API would be great, but I don't think Cython needs to move all the way to the limited API. There are three "levels" in the C API: - limited API, with long-term ABI compatibility guarantees

That's what "-DCYTHON_LIMITED_API -DPy_LIMITED_API=..." is supposed to do, which currently fails for much if not most code.

...
- "normal" public API, covered by the backwards compatibility policy (users need to recompile for every minor release, and watch for deprecation warnings)

That's probably close to what "-DCYTHON_LIMITED_API" does by itself as it stands. I can see that being a nice feature that just deserves a more suitable name. (The name was chosen because it was meant to also internally define "Py_LIMITED_API" at some point. Not sure if it will ever do that.)

...
- internal API (underscore-prefixed names, `internal` headers, things documented as private) AFAIK, only the last one is causing trouble here.

Yeah, and that's the current default mode on CPython.

Is is possible to automatically pick a different default version when building with a too new CPython version? That way projects can at least be used and tested with pre-releases of CPython, although possibly with less performance. Ronald

...

Maybe we should advertise the two modes more. And make sure that both work. There are certainly issues with the current state of the "limited API" implementation, but that just needs work and testing.

Stefan

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ESEPW36K... Code of Conduct: http://python.org/psf/codeofconduct/

— Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/

Christopher Barker

5:16 p.m.

...

Maybe we should advertise the two modes more. And make sure that both work.

That would be great — as a long time Cython user, I didn’t know they existed. To be fair, long-time means I figured out something that works years ago, and have kept doing that ever since.

It might also help to make it easy to set - e.g a flag to cythonize or something. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Stefan Behnel

5:56 p.m.

Ronald Oussoren via Python-Dev schrieb am 02.02.22 um 16:44:

...

...
On 2 Feb 2022, at 11:50, Stefan Behnel wrote: Petr Viktorin schrieb am 02.02.22 um 10:22:

...
- "normal" public API, covered by the backwards compatibility policy (users need to recompile for every minor release, and watch for deprecation warnings)

That's probably close to what "-DCYTHON_LIMITED_API" does by itself as it stands. I can see that being a nice feature that just deserves a more suitable name. (The name was chosen because it was meant to also internally define "Py_LIMITED_API" at some point. Not sure if it will ever do that.)

...
- internal API (underscore-prefixed names, `internal` headers, things documented as private) AFAIK, only the last one is causing trouble here.

Yeah, and that's the current default mode on CPython.

Is is possible to automatically pick a different default version when building with a too new CPython version? That way projects can at least be used and tested with pre-releases of CPython, although possibly with less performance.

As I already wrote elsewhere, that is making the assumption (or at least optimising for the case) that a new CPython version always breaks Cython. And it has the drawback that we'd get less feedback on the "normal" integration and may thus end up noticing problems only later in the CPython development cycle. I don't think this really solves a problem. In any case, before we start playing with the default settings, I'd rather let users see what *they* can make of the available options. Then we can still come back and see which use cases there are and how to support them better. Stefan

Petr Viktorin

3 Feb 3 Feb

12:47 p.m.

On 02. 02. 22 11:50, Stefan Behnel wrote:

...

Petr Viktorin schrieb am 02.02.22 um 10:22:

...
Moving off the internal (unstable) API would be great, but I don't think Cython needs to move all the way to the limited API. There are three "levels" in the C API:

- limited API, with long-term ABI compatibility guarantees

That's what "-DCYTHON_LIMITED_API -DPy_LIMITED_API=..." is supposed to do, which currently fails for much if not most code.

...
- "normal" public API, covered by the backwards compatibility policy (users need to recompile for every minor release, and watch for deprecation warnings)

That's probably close to what "-DCYTHON_LIMITED_API" does by itself as it stands. I can see that being a nice feature that just deserves a more suitable name. (The name was chosen because it was meant to also internally define "Py_LIMITED_API" at some point. Not sure if it will ever do that.)

...
- internal API (underscore-prefixed names, `internal` headers, things documented as private)

AFAIK, only the last one is causing trouble here.

Yeah, and that's the current default mode on CPython.

Beware that there are no guarantees on this API. It can change *at any time*. Technically, it can even change in point releases (only ABI must not change). We probably won't change it in a point release, especially not in a way that would break Cython, but it would still be great to move to the public API instead.

...

Maybe we should advertise the two modes more. And make sure that both work. There are certainly issues with the current state of the "limited API" implementation, but that just needs work and testing.

I wonder if it can it be renamed? "Limited API" has a specific meaning since PEP 384, and using it for the public API is adding to the general confusion in this area :(

Stefan Behnel

4 Feb 4 Feb

2:23 p.m.

Petr Viktorin schrieb am 03.02.22 um 13:47:

...

On 02. 02. 22 11:50, Stefan Behnel wrote:

...
Maybe we should advertise the two modes more. And make sure that both work. There are certainly issues with the current state of the "limited API" implementation, but that just needs work and testing.

I wonder if it can it be renamed? "Limited API" has a specific meaning since PEP 384, and using it for the public API is adding to the general confusion in this area :(

I was more referring to it as an *existing* compilation mode of Cython that avoids the usage of CPython implementation details. The fact that the implementation is incomplete just means that we spill over into non-limited API code when no limited API is available for a certain feature. That will usually be public API code, unless that is really not available either. One recent example is the new error locations in tracebacks, where PEP 657 explicitly lists the new "co_positions" field in code objects as an implementation detail of CPython. If we want to implement this in Cython, then there is no other way than to copy these implementation details pretty verbatimly from CPython and to depend on them. https://www.python.org/dev/peps/pep-0657/ In this specific case, we're lucky that this can be considered an entirely optional feature that we can separately disable when users request "public API" mode (let's call it that). Not sure if that's what users want, though. Stefan

Ethan Furman

3:33 p.m.

On 2/4/22 6:23 AM, Stefan Behnel wrote:

...

One recent example is the new error locations in tracebacks, where PEP 657 explicitly lists the new "co_positions" field in code objects as an implementation detail of CPython. If we want to implement this in Cython, then there is no other way than to copy these implementation details pretty verbatimly from CPython and to depend on them.

https://www.python.org/dev/peps/pep-0657/

In this specific case, we're lucky that this can be considered an entirely optional feature that we can separately disable when users request "public API" mode (let's call it that). Not sure if that's what users want, though.

Speaking as a user, I would want Cython to be (in order of preference): - reliable - fast - all the cool Python features and Python to: - make public APIs available for Python features -- ~Ethan~

Petr Viktorin

9 Feb 9 Feb

4:16 p.m.

On 04. 02. 22 15:23, Stefan Behnel wrote:

...

Petr Viktorin schrieb am 03.02.22 um 13:47:

...
On 02. 02. 22 11:50, Stefan Behnel wrote:

...
Maybe we should advertise the two modes more. And make sure that both work. There are certainly issues with the current state of the "limited API" implementation, but that just needs work and testing.

I wonder if it can it be renamed? "Limited API" has a specific meaning since PEP 384, and using it for the public API is adding to the general confusion in this area :(

I was more referring to it as an *existing* compilation mode of Cython that avoids the usage of CPython implementation details. The fact that the implementation is incomplete just means that we spill over into non-limited API code when no limited API is available for a certain feature. That will usually be public API code, unless that is really not available either.

One recent example is the new error locations in tracebacks, where PEP 657 explicitly lists the new "co_positions" field in code objects as an implementation detail of CPython. If we want to implement this in Cython, then there is no other way than to copy these implementation details pretty verbatimly from CPython and to depend on them.

https://www.python.org/dev/peps/pep-0657/

In this specific case, we're lucky that this can be considered an entirely optional feature that we can separately disable when users request "public API" mode (let's call it that). Not sure if that's what users want, though.

Should there be a getter/setter for co_positions? I'm unfortunately not aware of what Cython needs from code objects, but it might be good to extend the API here.

Pablo Galindo Salgado

4:40 p.m.

...

Should there be a getter/setter for co_positions?

We consider the representation of co_postions private, so we don't want (for now) to ad getters/setters. If you want to get the position of a instruction, you can use PyCode_Addr2Location On Wed, 9 Feb 2022 at 16:22, Petr Viktorin wrote:

...

On 04. 02. 22 15:23, Stefan Behnel wrote:

...
Petr Viktorin schrieb am 03.02.22 um 13:47:

...
On 02. 02. 22 11:50, Stefan Behnel wrote:

...
Maybe we should advertise the two modes more. And make sure that both work. There are certainly issues with the current state of the "limited API" implementation, but that just needs work and testing.

I wonder if it can it be renamed? "Limited API" has a specific meaning since PEP 384, and using it for the public API is adding to the general confusion in this area :(

I was more referring to it as an *existing* compilation mode of Cython that avoids the usage of CPython implementation details. The fact that the implementation is incomplete just means that we spill over into non-limited API code when no limited API is available for a certain feature. That will usually be public API code, unless that is really not available either.

One recent example is the new error locations in tracebacks, where PEP 657 explicitly lists the new "co_positions" field in code objects as an implementation detail of CPython. If we want to implement this in Cython, then there is no other way than to copy these implementation details pretty verbatimly from CPython and to depend on them.

https://www.python.org/dev/peps/pep-0657/

In this specific case, we're lucky that this can be considered an entirely optional feature that we can separately disable when users request "public API" mode (let's call it that). Not sure if that's what users want, though.

Should there be a getter/setter for co_positions? I'm unfortunately not aware of what Cython needs from code objects, but it might be good to extend the API here.

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XK4DMU7I... Code of Conduct: http://python.org/psf/codeofconduct/

Stefan Behnel

5:01 p.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

Pablo Galindo Salgado schrieb am 09.02.22 um 17:40:

...

...
Should there be a getter/setter for co_positions?

We consider the representation of co_postions private

Yes, and that's the issue.

...

so we don't want (for now) to ad getters/setters. If you want to get the position of a instruction, you can use PyCode_Addr2Location

What Cython needs is the other direction. How can we provide the current source position range for a given piece of code to an exception? As it stands, the way to do this is to copy the implementation details of CPython into Cython in order to let it expose the specific data structures that CPython uses for its internal representation of code positions. I would prefer using an API instead that allows exposing this mapping directly to CPython's traceback handling, rather than having to emulate byte code positions. While that would probably be quite doable, it's far from a nice interface for something that is not based on byte code. And that's not just a Cython issue. The same applies to Domain Specific Languages or other programming languages that integrate with Python and want to show users code positions for their source code. Stefan

Pablo Galindo Salgado

5:40 p.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

I can only say that currently, I am not confident to expose such an API, at least for co_positions, as the internal implementation is very likely to heavily change and we want to have the possibility of changing it between patch versions if required (to address bugs and other things like that). On Wed, 9 Feb 2022 at 17:38, Stefan Behnel wrote:

...

Pablo Galindo Salgado schrieb am 09.02.22 um 17:40:

...
...
Should there be a getter/setter for co_positions?

We consider the representation of co_postions private

Yes, and that's the issue.

...
so we don't want (for now) to ad getters/setters. If you want to get the position of a instruction, you can use PyCode_Addr2Location

What Cython needs is the other direction. How can we provide the current source position range for a given piece of code to an exception?

As it stands, the way to do this is to copy the implementation details of CPython into Cython in order to let it expose the specific data structures that CPython uses for its internal representation of code positions.

I would prefer using an API instead that allows exposing this mapping directly to CPython's traceback handling, rather than having to emulate byte code positions. While that would probably be quite doable, it's far from a nice interface for something that is not based on byte code.

And that's not just a Cython issue. The same applies to Domain Specific Languages or other programming languages that integrate with Python and want to show users code positions for their source code.

Stefan

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VQSWX6MF... Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

6:36 p.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

It might require a detailed API design proposal coming from outside CPython (e.g. from Cython) to get this to change. I imagine for co_positions in particular this would have to use a "builder" pattern. I am unclear on how this would work though, given that Cython generates C code, not CPython bytecode. How would the synthesized co_positions be used? Would Cython just generate a co_positions fragment at the moment an exception is raised, pointing at the .pyx file from which the code was generated? On Wed, Feb 9, 2022 at 9:41 AM Pablo Galindo Salgado wrote:

...

I can only say that currently, I am not confident to expose such an API, at least for co_positions, as the internal implementation is very likely to heavily change and we want to have the possibility of changing it between patch versions if required (to address bugs and other things like that).

On Wed, 9 Feb 2022 at 17:38, Stefan Behnel wrote:

...
Pablo Galindo Salgado schrieb am 09.02.22 um 17:40:

...
...
Should there be a getter/setter for co_positions?

We consider the representation of co_postions private

Yes, and that's the issue.

...
so we don't want (for now) to ad getters/setters. If you want to get the position of a instruction, you can use PyCode_Addr2Location

What Cython needs is the other direction. How can we provide the current source position range for a given piece of code to an exception?

As it stands, the way to do this is to copy the implementation details of CPython into Cython in order to let it expose the specific data structures that CPython uses for its internal representation of code positions.

I would prefer using an API instead that allows exposing this mapping directly to CPython's traceback handling, rather than having to emulate byte code positions. While that would probably be quite doable, it's far from a nice interface for something that is not based on byte code.

And that's not just a Cython issue. The same applies to Domain Specific Languages or other programming languages that integrate with Python and want to show users code positions for their source code.

Stefan

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VQSWX6MF... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/P7SMK5ZG... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...

Stefan Behnel

7:04 p.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

Guido van Rossum schrieb am 09.02.22 um 19:36:

...

On Wed, Feb 9, 2022 at 9:41 AM Pablo Galindo Salgado wrote:

...
On Wed, 9 Feb 2022 at 17:38, Stefan Behnel wrote:

...
Pablo Galindo Salgado schrieb am 09.02.22 um 17:40:

...
...
Should there be a getter/setter for co_positions?

We consider the representation of co_postions private

Yes, and that's the issue.

I can only say that currently, I am not confident to expose such an API, at least for co_positions, as the internal implementation is very likely to heavily change and we want to have the possibility of changing it between patch versions if required (to address bugs and other things like that).

It might require a detailed API design proposal coming from outside CPython (e.g. from Cython) to get this to change. I imagine for co_positions in particular this would have to use a "builder" pattern.

I am unclear on how this would work though, given that Cython generates C code, not CPython bytecode. How would the synthesized co_positions be used? Would Cython just generate a co_positions fragment at the moment an exception is raised, pointing at the .pyx file from which the code was generated?

So, what we currently do is to update the line number (which IIRC is really the start line number of the current function) on the current frame when an exception is raised, and the byte code offset to 0. That's a hack but shows the correct code line in the traceback. Probably conflicts with pdb, but there are still other issues with that anyway. I remember looking into the old lnotab mapping at some point and trying to implement that with fake byte code offsets but never got it finished. The idea is pretty simple, though. Instead of byte code offsets, we'd count our syntax tree nodes and just store the code position range of each syntax node at the "byte code offset" of the node's counter number. That's probably fairly easy to do in C code, maybe even with a statically allocated data structure. Then, instead of setting the frame function's line number, we'd set the frame's byte code instruction counter to the number of the failing syntax node, and CPython would retrieve the code position from that offset. That sounds simple enough, probably simpler than any API usage – but depends on implementation details. Especially the idea of storing all this statically in the data segment of the shared library sounds very tempting. Stefan

Petr Viktorin

10 Feb 10 Feb

10:22 a.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

On 09. 02. 22 20:04, Stefan Behnel wrote:

...

Guido van Rossum schrieb am 09.02.22 um 19:36:

...
On Wed, Feb 9, 2022 at 9:41 AM Pablo Galindo Salgado wrote:

...
On Wed, 9 Feb 2022 at 17:38, Stefan Behnel wrote:

...
Pablo Galindo Salgado schrieb am 09.02.22 um 17:40:

...
...
Should there be a getter/setter for co_positions?

We consider the representation of co_postions private

Yes, and that's the issue.

I can only say that currently, I am not confident to expose such an API, at least for co_positions, as the internal implementation is very likely to heavily change and we want to have the possibility of changing it between patch versions if required (to address bugs and other things like that).

It might require a detailed API design proposal coming from outside CPython (e.g. from Cython) to get this to change. I imagine for co_positions in particular this would have to use a "builder" pattern.

I am unclear on how this would work though, given that Cython generates C code, not CPython bytecode. How would the synthesized co_positions be used? Would Cython just generate a co_positions fragment at the moment an exception is raised, pointing at the .pyx file from which the code was generated?

So, what we currently do is to update the line number (which IIRC is really the start line number of the current function) on the current frame when an exception is raised, and the byte code offset to 0. That's a hack but shows the correct code line in the traceback. Probably conflicts with pdb, but there are still other issues with that anyway.

So, should there be a mechanism to set source/lineno/position on tracebacks/exceptions, rather than always requiring a frame for it?

Stefan Behnel

2:20 p.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

Petr Viktorin schrieb am 10.02.22 um 11:22:

...

So, should there be a mechanism to set source/lineno/position on tracebacks/exceptions, rather than always requiring a frame for it?

There's "_PyTraceback_Add()" currently, but it's incomplete in terms of what Cython would need. As it stands, Cython could make use of a function that accepted - string object arguments for filename and function name - (optionally) a 'globals' dict (or a reference to the current module) - (optionally) a 'locals' mapping - (optionally) a code object - a C integer source line - a C integer position, probably start and end lines and columns to add a traceback level to the current exception. I'm not sure about the code object since that's a rather heavy thing, but given that Cython needs to create code objects in order for its functions to be introspectible, that seems like a worthwhile option to have. However, with the recent frame stack refactoring and frame object now being lazily created, according to https://bugs.python.org/issue44032 https://bugs.python.org/issue44590 I guess Cython should rather integrate with the new stack frame infrastructure in general. That shifts the requirements a bit. An API function like the above would then still be helpful for the reduced API compile mode, I guess. But as soon as Cython uses InterpreterFrame structs internally, it would no longer be helpful for the fast mode. InterpreterFrame object are based on byte code instructions again, which brings us back to co_positions. Stefan

Andrew Svetlov

9 Feb 9 Feb

6:40 p.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

Stefan, do you really need to emulate call stack with positions? Could the __note__ string with generated Cython part of exception traceback solve your needs (https://www.python.org/dev/peps/pep-0678/) ? On Wed, Feb 9, 2022 at 7:46 PM Pablo Galindo Salgado wrote:

...

I can only say that currently, I am not confident to expose such an API, at least for co_positions, as the internal implementation is very likely to heavily change and we want to have the possibility of changing it between patch versions if required (to address bugs and other things like that).

On Wed, 9 Feb 2022 at 17:38, Stefan Behnel wrote:

...
Pablo Galindo Salgado schrieb am 09.02.22 um 17:40:

...
...
Should there be a getter/setter for co_positions?

We consider the representation of co_postions private

Yes, and that's the issue.

...
so we don't want (for now) to ad getters/setters. If you want to get the position of a instruction, you can use PyCode_Addr2Location

What Cython needs is the other direction. How can we provide the current source position range for a given piece of code to an exception?

As it stands, the way to do this is to copy the implementation details of CPython into Cython in order to let it expose the specific data structures that CPython uses for its internal representation of code positions.

I would prefer using an API instead that allows exposing this mapping directly to CPython's traceback handling, rather than having to emulate byte code positions. While that would probably be quite doable, it's far from a nice interface for something that is not based on byte code.

And that's not just a Cython issue. The same applies to Domain Specific Languages or other programming languages that integrate with Python and want to show users code positions for their source code.

Stefan

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VQSWX6MF... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/P7SMK5ZG... Code of Conduct: http://python.org/psf/codeofconduct/

-- Thanks, Andrew Svetlov

Stefan Behnel

8:20 p.m.

New subject: PEP-657 and co_positions (was: Please update Cython *before* introcuding C API incompatible changes in Python)

Andrew Svetlov schrieb am 09.02.22 um 19:40:

...

Stefan, do you really need to emulate call stack with positions? Could the __note__ string with generated Cython part of exception traceback solve your needs (https://www.python.org/dev/peps/pep-0678/) ?

Thanks for the link, but I think it would be surprising for users if a traceback displayed some code positions differently than others, when all code lines refer to Python code. Stefan

Ethan Furman

5:34 p.m.

On 2/9/22 8:40 AM, Pablo Galindo Salgado wrote:

...

Petr Viktorin wrote:

...

...
Should there be a getter/setter for co_positions?

We consider the representation of co_postions private, so we don't want (for now) to ad getters/setters.

Isn't the whole point of getters/setters is to allow public access to information, allowing the internal representation of that information to change? However the exception information is store by CPython, it's going to be needed by other frameworks. -- ~Ethan~

Victor Stinner

6:55 p.m.

On Wed, Feb 9, 2022 at 5:48 PM Pablo Galindo Salgado wrote:

...

We consider the representation of co_postions private, so we don't want (for now) to ad getters/setters. If you want to get the position of a instruction, you can use PyCode_Addr2Location

The code.co_positions() method is accessible in Python: it's not documented, but its name doesn't say that it's private. Was it done on purpose? Should it renamed to _co_positions() or even be removed? Victor -- Night gathers, and now my watch begins. It shall not end until my death.

Pablo Galindo Salgado

7:01 p.m.

That is on pourpose and is the public API for Python. In Python it returns an iterable of tuples, which is processed from the actual internal form. On Wed, 9 Feb 2022 at 18:56, Victor Stinner wrote:

...

On Wed, Feb 9, 2022 at 5:48 PM Pablo Galindo Salgado wrote:

...
We consider the representation of co_postions private, so we don't want (for now) to ad getters/setters. If you want to get the position of a instruction, you can use PyCode_Addr2Location

The code.co_positions() method is accessible in Python: it's not documented, but its name doesn't say that it's private. Was it done on purpose? Should it renamed to _co_positions() or even be removed?

Victor -- Night gathers, and now my watch begins. It shall not end until my death.

Victor Stinner

6:54 p.m.

Hi, It's already possible to call PyObject_CallMethod(code, "co_positions", NULL) and then use the iterator in C. Is there an use case where performance of reading co_positions is critical? If not, there is no need to add a specialized function. Victor On Wed, Feb 9, 2022 at 5:23 PM Petr Viktorin wrote:

...

On 04. 02. 22 15:23, Stefan Behnel wrote:

...
Petr Viktorin schrieb am 03.02.22 um 13:47:

...
On 02. 02. 22 11:50, Stefan Behnel wrote:

...
Maybe we should advertise the two modes more. And make sure that both work. There are certainly issues with the current state of the "limited API" implementation, but that just needs work and testing.

I wonder if it can it be renamed? "Limited API" has a specific meaning since PEP 384, and using it for the public API is adding to the general confusion in this area :(

I was more referring to it as an *existing* compilation mode of Cython that avoids the usage of CPython implementation details. The fact that the implementation is incomplete just means that we spill over into non-limited API code when no limited API is available for a certain feature. That will usually be public API code, unless that is really not available either.

One recent example is the new error locations in tracebacks, where PEP 657 explicitly lists the new "co_positions" field in code objects as an implementation detail of CPython. If we want to implement this in Cython, then there is no other way than to copy these implementation details pretty verbatimly from CPython and to depend on them.

https://www.python.org/dev/peps/pep-0657/

In this specific case, we're lucky that this can be considered an entirely optional feature that we can separately disable when users request "public API" mode (let's call it that). Not sure if that's what users want, though.

Should there be a getter/setter for co_positions? I'm unfortunately not aware of what Cython needs from code objects, but it might be good to extend the API here.

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XK4DMU7I... Code of Conduct: http://python.org/psf/codeofconduct/

-- Night gathers, and now my watch begins. It shall not end until my death.

799

Age (days ago)

808

Last active (days ago)

List overview

Download

49 comments

14 participants

participants (14)

Andrew Svetlov
Christian Heimes
Christopher Barker
dw-git＠d-woods.co.uk
Ethan Furman
Greg Ewing
Guido van Rossum
Irit Katriel
Miro Hrončok
Pablo Galindo Salgado
Petr Viktorin
Ronald Oussoren
Stefan Behnel
Victor Stinner

Please update Cython *before* introcuding C API incompatible changes in Python

tags

participants (14)

Please update Cython before introcuding C API incompatible changes in Python