PyPy3: is bytecode really incompatible between releases?

Hello, I've noticed that the compiled module suffix keeps changing between PyPy3 releases: it's been .pypy3-71-*.so for 7.1, now it's .pypy3-72- *.so (also .pyc). However, this is a bit surprising to me given that for PyPy2 it's still at .pypy-41.so. Is the bytecode generated by successive PyPy3 releases really incompatible between them? Or are the suffix changes only incidental? They cause quite some trouble for us, since they make it necessary to recompile installed modules on Gentoo, and PyPy's overzealous compiling causes access violations for our users. TIA for any help. -- Best regards, Michał Górny

On 19/10/19 11:55 pm, Michał Górny wrote: the pypy3.6 specific ones. The fact that the PyPy2 ABI tag did **not** change is most likely a bug in the release process. I think there will be cases where pypy2-v7.1.1 and pypy2-v7.2.0 c-extension modules will be slightly incompatible with eachother, although a cursory test of mixing NumPy across versions (building on 7.2, copying into 7.1) seems to pass "np.test()". Perhaps a better test would be to mix the pygolang c-extension modules, which seem to stress-test the C-API more extensively given the number of issues it exposes. The pyc files should always be rebuilt in each python environment, so I am not sure what problems could be caused by bumping the ABI tag. Does Gentoo somehow mix the byte-compiled pyc files across versions? I am not sure what you mean by "compiling causes access violations for our users", could you point to a discussion of the problem? Matti

On Sun, 2019-10-20 at 07:32 +0300, Matti Picus wrote:
Thanks for the explanation. This answers my question. Do you need me to file a bug for the ABI tag not changing in PyPy2, or will you take it from here?
It's related to circular dependencies between packages. For example, when rebuild setuptools for the new version, it tries to load its plugins and byte-compile them as well. Since they don't belong to the setuptools package, our PM catches that as illegal access. It's a generic problem with Python, not something you need to worry about. I've just been pointed out that we already have a hack for it, I just need to add PyPy3 to it. -- Best regards, Michał Górny

On 20/10/19 10:01 am, Michał Górny wrote:
I would like to confirm that in fact there is an issue: that the c-extension shared objects are incompatible. I am not completely convinced this is the case, at least my experimentation with NumPy proved it indeed is *not* the case for PyPy2. I am open to hearing opinions from others. Is there a concensus around whether we do need to change the ABI designation? I think this would also require recompilation of any CFFI shared objects on PyPy2. Matti

Hi Matti, On Sun, 20 Oct 2019 at 09:47, Matti Picus <matti.picus@gmail.com> wrote:
In PyPy2 there are two different numbers: the version in the ".pypy-XY.so" extension, and the internal version in the ".pyc" files. In PyPy3 the ".pyc" files have grown to ".pypy-XY.pyc". (This is confusing because if you translate PyPy3.6 and the in-progress PyPy3.7 then they'll try to use the same ".pypy-XY.pyc" extension, even though the internal bytecode version in that file is different.) If we want a single number that changes mostly every release, then we're doing the right thing. If instead we prefer to keep several more precise numbers, we should use different numbers for (a) the C extensions; (b) the .pyc files; and even (c) the cffi modules. As far as I understand, the problem with doing that is that people used to (and code used on) CPython are not really ready to handle this situation. As for the precise question you're asking, "do we need to change the ABI designation in PyPy2", the answer is yes, imho: we should change it as soon as we break the ABI, even if only in a corner case that doesn't concern most C extensions... A bientôt, Armin.

On 20/10/19 11:21 pm, Armin Rigo wrote:
I see CPython uses the python {major}{minor} version: "example.cpython-36.pyc". We should probably change our convention to do the same. Any idea where that happens?
The code in question is in pypy/module/imp/importing.py, and has a comment # this used to change for every minor version, but no longer does: there # is little point any more, as the so's tend to be cross-version- # compatible, more so than between various versions of CPython. Be # careful if we need to update it again: it is now used for both cpyext # and cffi so's. If we do have to update it, we'd likely need a way to # split the two usages again. #DEFAULT_SOABI = 'pypy-%d%d' % PYPY_VERSION[:2] DEFAULT_SOABI = 'pypy-41' So do we update it across the board for each change in the cpyext ABI? Matti

Hi Matti, On Tue, 22 Oct 2019 at 15:34, Matti Picus <matti.picus@gmail.com> wrote:
No, my point was that if we want to do that we should first split the usages, and only update the version used for C extension modules. In other words we should not update it for cffi modules (which is unlikely to ever change, and I can check but I think the same .so works for pypy2 and pypy3, so maybe a version number is not needed at all); and also not for .pyc files (which should just be "pypy-36" for pypy3 implementing python 3.6, and if at some point we really want to add a new bytecode, then well, we'll think again, I suppose). A bientôt, Armin.

Hi again, On Wed, 23 Oct 2019 at 16:32, Armin Rigo <armin.rigo@gmail.com> wrote:
Yes, I think that's the case. The .so for cffi should be almost entirely the same on pypy2 and pypy3, with one minor difference that turns out not to matter. (The module exports a function _cffi_pypyinit__foo() that is declared to return void on pypy2, but "PyObject *" on pypy3---where it returns NULL and the actual return value is never checked. We do it that way because we're reusing the convenient macro PyMODINIT_FUNC from Python.h.) A bientôt, Armin.

On 19/10/19 11:55 pm, Michał Górny wrote:
I committed changes that: - on py3.6 (for python 3.6, 3.7 and up) change the *.pyc file name to follow the cpython spec: filename.pypy-36.pyc - on default (for python2) change the DEFAULT_SOABI to track the pypy_version; so's will now be named .pypy-72.so (on py3.6 they are named filename.pypy3-72-x86_64-linux-gnu.so so they will not clash) For CFFI, we discussed a pypy-specific "stable-api" extension to mirror the cpython3 "abi3" tag, the idea still needs to be fleshed out and implemented. The reasoning behind the changes to the pyc filename and so filenames is explained earlier in this mail thread, I will not repeat them here but if I was mistaken please help me get it right. The changes, if not reverted, will be part of the next release cycle. Matti

On 25/10/19 10:49 am, Matti Picus wrote:
Needs more thought. The changes in the C-API are reflected in the platform tag: 71 is incompatible (perhaps only slightly) with 72. What breaking changes are there from the perspective of a C-API module between python 3.6 to 3.7, 3.8, 3.9? Matti

Hi Matti, On Fri, 25 Oct 2019 at 10:21, Matti Picus <matti.picus@gmail.com> wrote:
The C module itself may contain "#if PY_VERSION_HEX >= 0x03070000" or similar, in order to compile some feature (or work around some issue) that is only available on CPython 3.{N} but no 3.{N-1}. So I think it's a good idea to include both the CPython and the PyPy version in the name. A bientôt, Armin.

On 25/10/19 3:09 pm, Armin Rigo wrote:
Would this be considered a major API breaking change or only a revision change? Would we need to change to pypy 8.0 (i.e. pypy36-pp80-x86_64-linux-gnu.so), or can we stay with pypy 7.3 (i.e., pypy36-pp73-x86_64-linux-gnu.so)? In any case, wheels made for pypy before this change would not be compatible with ones after it. Matti

Hi Matti, On Sun, 27 Oct 2019 at 10:33, Matti Picus <matti.picus@gmail.com> wrote:
I like to think that major versions of PyPy should also indicate that we did some major work in other areas, like the JIT compiler, the json decoder, etc. etc. The question of whether the next version should be called "7.3" or "8.0" should weight on that IMHO. It should not depend *only* on whether we broke the API inside cpyext. That means cpyext needs to have its own way to tell that the API broke; for example, it could use file names "pypy36-pp#-x86_64-linux-gnu.so" with the "#" being the API version number. Something like that. Maybe just an increasing number starting at 42 (as the number following the "pypy41" we use so far; unrelated to the meaning of life!) A bientôt, Armin.

On 19/10/19 11:55 pm, Michał Górny wrote: the pypy3.6 specific ones. The fact that the PyPy2 ABI tag did **not** change is most likely a bug in the release process. I think there will be cases where pypy2-v7.1.1 and pypy2-v7.2.0 c-extension modules will be slightly incompatible with eachother, although a cursory test of mixing NumPy across versions (building on 7.2, copying into 7.1) seems to pass "np.test()". Perhaps a better test would be to mix the pygolang c-extension modules, which seem to stress-test the C-API more extensively given the number of issues it exposes. The pyc files should always be rebuilt in each python environment, so I am not sure what problems could be caused by bumping the ABI tag. Does Gentoo somehow mix the byte-compiled pyc files across versions? I am not sure what you mean by "compiling causes access violations for our users", could you point to a discussion of the problem? Matti

On Sun, 2019-10-20 at 07:32 +0300, Matti Picus wrote:
Thanks for the explanation. This answers my question. Do you need me to file a bug for the ABI tag not changing in PyPy2, or will you take it from here?
It's related to circular dependencies between packages. For example, when rebuild setuptools for the new version, it tries to load its plugins and byte-compile them as well. Since they don't belong to the setuptools package, our PM catches that as illegal access. It's a generic problem with Python, not something you need to worry about. I've just been pointed out that we already have a hack for it, I just need to add PyPy3 to it. -- Best regards, Michał Górny

On 20/10/19 10:01 am, Michał Górny wrote:
I would like to confirm that in fact there is an issue: that the c-extension shared objects are incompatible. I am not completely convinced this is the case, at least my experimentation with NumPy proved it indeed is *not* the case for PyPy2. I am open to hearing opinions from others. Is there a concensus around whether we do need to change the ABI designation? I think this would also require recompilation of any CFFI shared objects on PyPy2. Matti

Hi Matti, On Sun, 20 Oct 2019 at 09:47, Matti Picus <matti.picus@gmail.com> wrote:
In PyPy2 there are two different numbers: the version in the ".pypy-XY.so" extension, and the internal version in the ".pyc" files. In PyPy3 the ".pyc" files have grown to ".pypy-XY.pyc". (This is confusing because if you translate PyPy3.6 and the in-progress PyPy3.7 then they'll try to use the same ".pypy-XY.pyc" extension, even though the internal bytecode version in that file is different.) If we want a single number that changes mostly every release, then we're doing the right thing. If instead we prefer to keep several more precise numbers, we should use different numbers for (a) the C extensions; (b) the .pyc files; and even (c) the cffi modules. As far as I understand, the problem with doing that is that people used to (and code used on) CPython are not really ready to handle this situation. As for the precise question you're asking, "do we need to change the ABI designation in PyPy2", the answer is yes, imho: we should change it as soon as we break the ABI, even if only in a corner case that doesn't concern most C extensions... A bientôt, Armin.

On 20/10/19 11:21 pm, Armin Rigo wrote:
I see CPython uses the python {major}{minor} version: "example.cpython-36.pyc". We should probably change our convention to do the same. Any idea where that happens?
The code in question is in pypy/module/imp/importing.py, and has a comment # this used to change for every minor version, but no longer does: there # is little point any more, as the so's tend to be cross-version- # compatible, more so than between various versions of CPython. Be # careful if we need to update it again: it is now used for both cpyext # and cffi so's. If we do have to update it, we'd likely need a way to # split the two usages again. #DEFAULT_SOABI = 'pypy-%d%d' % PYPY_VERSION[:2] DEFAULT_SOABI = 'pypy-41' So do we update it across the board for each change in the cpyext ABI? Matti

Hi Matti, On Tue, 22 Oct 2019 at 15:34, Matti Picus <matti.picus@gmail.com> wrote:
No, my point was that if we want to do that we should first split the usages, and only update the version used for C extension modules. In other words we should not update it for cffi modules (which is unlikely to ever change, and I can check but I think the same .so works for pypy2 and pypy3, so maybe a version number is not needed at all); and also not for .pyc files (which should just be "pypy-36" for pypy3 implementing python 3.6, and if at some point we really want to add a new bytecode, then well, we'll think again, I suppose). A bientôt, Armin.

Hi again, On Wed, 23 Oct 2019 at 16:32, Armin Rigo <armin.rigo@gmail.com> wrote:
Yes, I think that's the case. The .so for cffi should be almost entirely the same on pypy2 and pypy3, with one minor difference that turns out not to matter. (The module exports a function _cffi_pypyinit__foo() that is declared to return void on pypy2, but "PyObject *" on pypy3---where it returns NULL and the actual return value is never checked. We do it that way because we're reusing the convenient macro PyMODINIT_FUNC from Python.h.) A bientôt, Armin.

On 19/10/19 11:55 pm, Michał Górny wrote:
I committed changes that: - on py3.6 (for python 3.6, 3.7 and up) change the *.pyc file name to follow the cpython spec: filename.pypy-36.pyc - on default (for python2) change the DEFAULT_SOABI to track the pypy_version; so's will now be named .pypy-72.so (on py3.6 they are named filename.pypy3-72-x86_64-linux-gnu.so so they will not clash) For CFFI, we discussed a pypy-specific "stable-api" extension to mirror the cpython3 "abi3" tag, the idea still needs to be fleshed out and implemented. The reasoning behind the changes to the pyc filename and so filenames is explained earlier in this mail thread, I will not repeat them here but if I was mistaken please help me get it right. The changes, if not reverted, will be part of the next release cycle. Matti

On 25/10/19 10:49 am, Matti Picus wrote:
Needs more thought. The changes in the C-API are reflected in the platform tag: 71 is incompatible (perhaps only slightly) with 72. What breaking changes are there from the perspective of a C-API module between python 3.6 to 3.7, 3.8, 3.9? Matti

Hi Matti, On Fri, 25 Oct 2019 at 10:21, Matti Picus <matti.picus@gmail.com> wrote:
The C module itself may contain "#if PY_VERSION_HEX >= 0x03070000" or similar, in order to compile some feature (or work around some issue) that is only available on CPython 3.{N} but no 3.{N-1}. So I think it's a good idea to include both the CPython and the PyPy version in the name. A bientôt, Armin.

On 25/10/19 3:09 pm, Armin Rigo wrote:
Would this be considered a major API breaking change or only a revision change? Would we need to change to pypy 8.0 (i.e. pypy36-pp80-x86_64-linux-gnu.so), or can we stay with pypy 7.3 (i.e., pypy36-pp73-x86_64-linux-gnu.so)? In any case, wheels made for pypy before this change would not be compatible with ones after it. Matti

Hi Matti, On Sun, 27 Oct 2019 at 10:33, Matti Picus <matti.picus@gmail.com> wrote:
I like to think that major versions of PyPy should also indicate that we did some major work in other areas, like the JIT compiler, the json decoder, etc. etc. The question of whether the next version should be called "7.3" or "8.0" should weight on that IMHO. It should not depend *only* on whether we broke the API inside cpyext. That means cpyext needs to have its own way to tell that the API broke; for example, it could use file names "pypy36-pp#-x86_64-linux-gnu.so" with the "#" being the API version number. Something like that. Maybe just an increasing number starting at 42 (as the number following the "pypy41" we use so far; unrelated to the meaning of life!) A bientôt, Armin.
participants (3)
-
Armin Rigo
-
Matti Picus
-
Michał Górny