versioned .so files for Python 3.2
This is a follow up to PEP 3147. That PEP, already implemented in Python 3.2, allows for Python source files from different Python versions to live together in the same directory. It does this by putting a magic tag in the .pyc file name and placing the .pyc file in a __pycache__ directory. Distros such as Debian and Ubuntu will use this to greatly simplifying deploying Python, and Python applications and libraries. Debian and Ubuntu usually ship more than one version of Python, and currently have to play complex games with symlinks to make this work. PEP 3147 will go a long way to eliminating the need for extra directories and symlinks. One more thing I've found we need though, is a way to handled shared libraries for extension modules. Just as we can get name collisions on foo.pyc, we can get collisions on foo.so. We obviously cannot install foo.so built for Python 3.2 and foo.so built for Python 3.3 in the same location. So symlink nightmare's mini-me is back. I have a fairly simple fix for this. I'd actually be surprised if this hasn't been discussed before, but teh Googles hasn't turned up anything. The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility). (On file naming: the original patch used foo.so.3.2 and that works just as well, but I thought there might be tools that expect exactly a '.so' suffix, so I changed it to put the Major.Minor version number to the left of the extension. The exact naming scheme is of course open to debate.) This is a much simpler patch than PEP 3147, though I'm not 100% sure it's the right approach. The way this works is by modifying the configure and Makefile.pre.in to put the version number in the $SO make variable. Python parses its (generated) Makefile to find $SO and it uses this deep in the bowels of distutils to decide what suffix to use when writing shared libraries built by 'python setup.py build_ext'. This means the patched Python only writes versioned .so files by default. I personally don't see that as a problem, and it does not affect the test suite, with the exception of one easily tweaked test. I don't know if third party tools will care. The fact that traditional foo.so shared libraries will still satisfy the import should be enough, I think. The patch is currently Linux only, since I need this for Debian and Ubuntu and wanted to keep the change narrow. Other possible approaches: * Extend the distutils API so that the .so file extension can be passed in, instead of being essentially hardcoded to what Python's Makefile contains. * Keep the dynload_shlib.c change, but modify the Debian/Ubuntu build environment to pass in $SO to make (though the configure.in warning and sleep is a little annoying). * Add a ./configure option to enable this, which Debuntu's build would use. The patch is available here: http://pastebin.ubuntu.com/454512/ and my working branch is here: https://code.edge.launchpad.net/~barry/python/sovers Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense. Cheers, -Barry
2010/6/24 Barry Warsaw <barry@python.org>:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
How will this interact with PEP 384 if that is implemented? -- Regards, Benjamin
On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw <barry@python.org>:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
How will this interact with PEP 384 if that is implemented?
Good question, I'd forgotten to mention that PEP. I think the PEP is a good idea, and worth working on, but it is a longer term solution to the problem of extension source code compatibility. It's longer term because extensions will have to be rewritten to use the new API defined in PEP 384. It will take a long time to get this into practice, and supporting it will be a case-by-case basis. I'm trying to come up with something that will work immediately while PEP 384 is being adopted. -Barry
2010/6/24 Barry Warsaw <barry@python.org>:
On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw <barry@python.org>:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
How will this interact with PEP 384 if that is implemented? I'm trying to come up with something that will work immediately while PEP 384 is being adopted.
But how will modules specify that they support multiple ABIs then? -- Regards, Benjamin
On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw <barry@python.org>:
On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw <barry@python.org>:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
How will this interact with PEP 384 if that is implemented? I'm trying to come up with something that will work immediately while PEP 384 is being adopted.
But how will modules specify that they support multiple ABIs then?
I didn't understand, so asked Benjamin for clarification in IRC. <gutworth> barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports the stable abi, will it load it? [14:25] <barry> gutworth: thanks, now i get it :) [14:26] <barry> gutworth: i think it should, but it wouldn't under my scheme. let me think about it -Barry
On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote:
On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw <barry@python.org>:
On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw <barry@python.org>:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
How will this interact with PEP 384 if that is implemented? I'm trying to come up with something that will work immediately while PEP 384 is being adopted.
But how will modules specify that they support multiple ABIs then?
I didn't understand, so asked Benjamin for clarification in IRC.
<gutworth> barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports the stable abi, will it load it? [14:25] <barry> gutworth: thanks, now i get it :) [14:26] <barry> gutworth: i think it should, but it wouldn't under my scheme. let me think about it
So, we could say that PEP 384 compliant extension modules would get written without a version specifier. IOW, we'd treat foo.so as using the ABI. It would then be up to the Python runtime to throw ImportErrors if in fact we were loading a legacy, non-PEP 384 compliant extension. -Barry
On 24.06.2010 22:46, Barry Warsaw wrote:
On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote:
On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw<barry@python.org>:
On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote:
2010/6/24 Barry Warsaw<barry@python.org>:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections<wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
How will this interact with PEP 384 if that is implemented? I'm trying to come up with something that will work immediately while PEP 384 is being adopted.
But how will modules specify that they support multiple ABIs then?
I didn't understand, so asked Benjamin for clarification in IRC.
<gutworth> barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports the stable abi, will it load it? [14:25] <barry> gutworth: thanks, now i get it :) [14:26] <barry> gutworth: i think it should, but it wouldn't under my scheme. let me think about it
So, we could say that PEP 384 compliant extension modules would get written without a version specifier. IOW, we'd treat foo.so as using the ABI. It would then be up to the Python runtime to throw ImportErrors if in fact we were loading a legacy, non-PEP 384 compliant extension.
Is it realistic to never break the ABI? I would think of having the ABI encoded in the file name as well, and only bump the ABI if it does change. With the "versioned .so files" proposal an ABI bump is necessary with every python version, with PEP 384 the ABI bump will be decoupled from the python version. Matthias
On Jun 26, 2010, at 10:22 PM, Matthias Klose wrote:
On 24.06.2010 22:46, Barry Warsaw wrote:
So, we could say that PEP 384 compliant extension modules would get written without a version specifier. IOW, we'd treat foo.so as using the ABI. It would then be up to the Python runtime to throw ImportErrors if in fact we were loading a legacy, non-PEP 384 compliant extension.
Is it realistic to never break the ABI? I would think of having the ABI encoded in the file name as well, and only bump the ABI if it does change. With the "versioned .so files" proposal an ABI bump is necessary with every python version, with PEP 384 the ABI bump will be decoupled from the python version.
You're right that the ABI will break, requiring a bump, and I think you're right that this means that PEP 384 compliant shared libraries would have to have a version number in their file name (assuming the versioned .so proposal is accepted). The problem is that we would need two version numbers, one for extension modules that are not PEP 384 complaint (and thus get bumped for every new Python version), and one for modules that are PEP 384 compliant (and thus only get bumped once in a while). The reason is that I think it will always be the case that we will have PEP 384 compliant and non-compliant extension modules. Perhaps identifying the underlying problems will lead to a more acceptable patch for Python. My patch tries to take a simple (perhaps too simplistic) solution, and I'm not married to it, but I think the general idea of versioned .so files is the right one. 1. The file name extensions that Python searches for are hardcoded and compiled in. dyload_shlib.c hard codes the file name pattern that extension modules must have in order for Python to load them. They must be <foo>.so or <foo>module.so. This gets compiled into Python at build time and there's no way for a distro (or anyone else who builds Python from source) to extend the file name patterns without modifying the source code. 2. The extension that distutils writes for shared libraries is dictated by build-time options and cannot be overridden. When you ./configure Python, autoconf figures out what shared library extension your platform uses. It substitutes this into a Makefile variable. That Makefile gets installed into your system with the base Python package and distutils parses the Makefile looking for this variable. When distutils calls your platform compiler, it uses this Makefile variable as the file name extension to use for your shared library. You cannot change this or override it to get distutils to write some other file name extension, well. Of these two problems, #1 is more serious because we have to modify the Python source code to hack in addition shared library search suffixes. #2 can be worked around by renaming the .so file after the build. The disadvantage of this though is that if you're a local packager, you'll have to remember to do the same thing if you want multiple Python version support, because distutils won't take care of it for you. Maybe that's okay, in which case it would still be good to address #1. -Barry
On Thu, Jun 24, 2010 at 10:50 AM, Barry Warsaw <barry@python.org> wrote:
The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility).
What use case does this address? PEP 3147 addresses the fact that the user may have different versions of Python installed and each wants to write a .pyc file when loading a module. .so files are not generated simply by running the Python interpreter, ergo .so files are not an issue for that use case. If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
On Jun 24, 2010, at 11:05 AM, Daniel Stutzbach wrote:
On Thu, Jun 24, 2010 at 10:50 AM, Barry Warsaw <barry@python.org> wrote:
The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility).
What use case does this address?
Specifically, it's the use case where we (Debian/Ubuntu) plan on installing all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, we can do that without collisions on the pyc files, but would still have to symlink for extension module .so files, because they are always named foo.so and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's foo.so. So using the same trick as in PEP 3147, if we can name Python 3.2's foo extension differently than the incompatible Python 3.3's foo extension, we can have them live in the same directory without symlink tricks.
PEP 3147 addresses the fact that the user may have different versions of Python installed and each wants to write a .pyc file when loading a module. .so files are not generated simply by running the Python interpreter, ergo .so files are not an issue for that use case.
See above. It doesn't matter whether the pyc or so is created at run time by the user or by the distro build system. If the files for different Python versions end up in the same directory, they must be named differently too.
If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options.
This is a good point, but more easily addressed. Let's say a distro makes three Python 3.2 variants available, one "normal" build, a debug build, and UCS2 and USC4 versions of the above. All we need to do is choose a different .so ABI tag (see previous follow) for each of those builds. My updated patch (coming soon) allows you to define that tag to configure. So e.g. Normal build UCSX: SOABI=cpython-32 ./configure Debug build UCSX: SOABI=cpython-32-d ./configure Normal build UCSY: SOABI=cpython-32-w ./configure Debug build UCSY: SOABI=cpython-32-dw ./configure Mix and match for any other build options you care about. Because the distro controls how Python is configured, this should be fairly easy to achieve. -Barry
On 6/24/2010 5:09 PM, Barry Warsaw wrote:
What use case does this address?
Specifically, it's the use case where we (Debian/Ubuntu) plan on installing all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, we can do that without collisions on the pyc files, but would still have to symlink for extension module .so files, because they are always named foo.so and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's foo.so.
If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)?
So using the same trick as in PEP 3147, if we can name Python 3.2's foo extension differently than the incompatible Python 3.3's foo extension, we can have them live in the same directory without symlink tricks.
Why would a symlink trick even be necessary if there is a version-unspecific directory and a version-specific directory on the search path?
PEP 3147 addresses the fact that the user may have different versions of Python installed and each wants to write a .pyc file when loading a module. .so files are not generated simply by running the Python interpreter, ergo .so files are not an issue for that use case.
See above. It doesn't matter whether the pyc or so is created at run time by the user or by the distro build system. If the files for different Python versions end up in the same directory, they must be named differently too.
But the only motivation for doing this with .pyc files is that the .py files are able to be shared, since the .pyc is an on-demand-generated, version-specific artifact (and not the source). The .so file is created offline by another toolchain, is version-specific, and presumably you are not suggesting that Python generate it on-demand.
If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options.
This is a good point, but more easily addressed. Let's say a distro makes three Python 3.2 variants available, one "normal" build, a debug build, and UCS2 and USC4 versions of the above. All we need to do is choose a different .so ABI tag (see previous follow) for each of those builds. My updated patch (coming soon) allows you to define that tag to configure. So e.g.
Why is this use case not already addressed by having independent directories? And why is there an incentive to co-mingle these version-punned files with version-agnostic ones?
Mix and match for any other build options you care about. Because the distro controls how Python is configured, this should be fairly easy to achieve.
For packages that have .so files, won't the distro already have to build multiple copies of that package for all version of Python? So, why can't it place them in separate directories that are version-specific at that time? This is not the same as placing .py files that are version-agnostic into a version-agnostic location. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
Scott Dial wrote:
On 6/24/2010 5:09 PM, Barry Warsaw wrote:
What use case does this address?
If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options.
This is a good point, but more easily addressed. Let's say a distro makes three Python 3.2 variants available, one "normal" build, a debug build, and UCS2 and USC4 versions of the above. All we need to do is choose a different .so ABI tag (see previous follow) for each of those builds. My updated patch (coming soon) allows you to define that tag to configure. So e.g.
Why is this use case not already addressed by having independent directories? And why is there an incentive to co-mingle these version-punned files with version-agnostic ones?
I don't think this is a good idea. After a while your Python lib directories would need some serious dusting off to make them maintainable again. Disk space is cheap so setting up dedicated directories for each variant will result in a much easier to manage installation. If you want a really clever setup, use hard links between those directory (you can also use symlinks if you like). Then a change in one Python file will automatically propagate to all other variant dirs without any maintenance effort. Together with PYTHONHOME this makes a really nice virtualenv-like environment. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 25 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 23 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Jun 25, 2010, at 12:35 AM, M.-A. Lemburg wrote:
Scott Dial wrote:
On 6/24/2010 5:09 PM, Barry Warsaw wrote:
What use case does this address?
If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options.
This is a good point, but more easily addressed. Let's say a distro makes three Python 3.2 variants available, one "normal" build, a debug build, and UCS2 and USC4 versions of the above. All we need to do is choose a different .so ABI tag (see previous follow) for each of those builds. My updated patch (coming soon) allows you to define that tag to configure. So e.g.
Why is this use case not already addressed by having independent directories? And why is there an incentive to co-mingle these version-punned files with version-agnostic ones?
I don't think this is a good idea. After a while your Python lib directories would need some serious dusting off to make them maintainable again.
Disk space is cheap so setting up dedicated directories for each variant will result in a much easier to manage installation.
If you want a really clever setup, use hard links between those directory (you can also use symlinks if you like). Then a change in one Python file will automatically propagate to all other variant dirs without any maintenance effort. Together with PYTHONHOME this makes a really nice virtualenv-like environment.
Note that I do believe there is a difference between what users maintaining their own Python installations might want, and what a distro needs to maintain its entire Python stack. So while dedicated directories might make more sense if you're maintaining your own Python built from source, it doesn't make as much sense for a distro, as described in previous responses by Matthias. -Barry
Barry Warsaw wrote:
On Jun 25, 2010, at 12:35 AM, M.-A. Lemburg wrote:
Scott Dial wrote:
On 6/24/2010 5:09 PM, Barry Warsaw wrote:
What use case does this address?
If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options.
This is a good point, but more easily addressed. Let's say a distro makes three Python 3.2 variants available, one "normal" build, a debug build, and UCS2 and USC4 versions of the above. All we need to do is choose a different .so ABI tag (see previous follow) for each of those builds. My updated patch (coming soon) allows you to define that tag to configure. So e.g.
Why is this use case not already addressed by having independent directories? And why is there an incentive to co-mingle these version-punned files with version-agnostic ones?
I don't think this is a good idea. After a while your Python lib directories would need some serious dusting off to make them maintainable again.
Disk space is cheap so setting up dedicated directories for each variant will result in a much easier to manage installation.
If you want a really clever setup, use hard links between those directory (you can also use symlinks if you like). Then a change in one Python file will automatically propagate to all other variant dirs without any maintenance effort. Together with PYTHONHOME this makes a really nice virtualenv-like environment.
Note that I do believe there is a difference between what users maintaining their own Python installations might want, and what a distro needs to maintain its entire Python stack. So while dedicated directories might make more sense if you're maintaining your own Python built from source, it doesn't make as much sense for a distro, as described in previous responses by Matthias.
Fair enough. I haven't followed the thread closely, so Matthias will probably already have answered this: The Python default installation dir for libs (including site-packages) is $prefix/lib/pythonX.X, so you already have separate and properly versioned directory paths. What difference would the extra version on the .so file make in such a setup ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 30 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 18 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Jun 30, 2010, at 10:35 PM, M.-A. Lemburg wrote:
The Python default installation dir for libs (including site-packages) is $prefix/lib/pythonX.X, so you already have separate and properly versioned directory paths.
What difference would the extra version on the .so file make in such a setup ?
So on Debian (inherted by Ubuntu), that would be /usr/lib/pythonX.Y and in fact if you look there, you'll see the normal standard library layout you'd expect. There is no site-packages under there though because we install add-ons to dist-packages[*] but functionally it's what you'd expect. However, if you look inside dist-packages, you'll see something a little weird. All the .py files .egg-info files will actually be symlinks into a parallel hierarchy under /usr/share/pyshared. Also, under dist-packages you'll see a directory layout that looks like it normally would under a standard site-packages layout, except that again, all the .py files are symlinked to the pyshared location. The .so and .pyc files in dist-packages will be the actual .so and .pyc files. Why is that? Well, it's because the .py files can be shared but the .pyc and .so files can't. Getting rid of these crufty symlink farms was the initial motivation for PEP 3147. In a PEP 3147 world, pure-Python packages don't need the directory /usr/lib/pythonX.Y/dist-packages. We can just install packages to /usr/share/pyshared and none of the .pyc files will collide between Python versions. It makes distro packaging simpler (no symlinks to keep track of or post-install linking and post-removal clean up to do) and smaller (only one .py file for all supported Python versions). All we have to do is a post-install byte-compilation and we're done. When we remove such a package we only *have* to remove the .py source file because PEP 3147 will not load a __pycache__ pyc if the source is missing (though it's easy enough to clean up the pyc files too). Throw extension modules into the mix and we have the file collision problem again. We can't install _foo.so into /usr/share/pyshared because that's going to be Python ABI-specific. If we can ABI-tag the .so and ensure it will get imported, then we're back to the simpler layout where /usr/share/pyshared is the destination for all installed Python packages. -Barry [*] That's done as a compromise between Debian's interpretation of the FHS and the conflict of that interpretation to from-source installs of Python. Specifically, Debian puts /usr/local/lib/pythonX.Y/dist-packages on the *system* Python's path because that's where it expects system administrators to put their own add-ons to the system Python. That used to be /usr/local/lib/pythonX.Y/site-packages but then if you did a from-source altinstall for example of the same Python version, it would be possible for a "/usr/local/bin/pythonX.Y setup.py install" of a third-party package to break your system Python. Not good! (And yes, it happened to me :).
On Jun 24, 2010, at 5:53 PM, Scott Dial wrote:
On 6/24/2010 5:09 PM, Barry Warsaw wrote:
What use case does this address?
Specifically, it's the use case where we (Debian/Ubuntu) plan on installing all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, we can do that without collisions on the pyc files, but would still have to symlink for extension module .so files, because they are always named foo.so and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's foo.so.
If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)
Because python looks for .so files in the same place it looks for the .py files of the same package. E.g., given a module like lxml, it contains the following files (among others): lxml/ lxml/__init__.py lxml/__init__.pyc lxml/builder.py lxml/builder.pyc lxml/etree.so And you can only put it in one place. Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. James
James Y Knight <foom@fuhm.net> writes:
Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that.
+1 So who's going to draft the “Filesystem Hierarchy Standard compliance” PEP? :-) -- \ “Having sex with Rachel is like going to a concert. She yells a | `\ lot, and throws frisbees around the room; and when she wants | _o__) more, she lights a match.” —Steven Wright | Ben Finney
On 25.06.2010 02:54, Ben Finney wrote:
James Y Knight<foom@fuhm.net> writes:
Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that.
+1
So who's going to draft the “Filesystem Hierarchy Standard compliance” PEP? :-)
This has nothing to do with the FHS. The FHS talks about data, not code.
On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote:
On 25.06.2010 02:54, Ben Finney wrote:
James Y Knight<foom@fuhm.net> writes:
Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that.
+1
So who's going to draft the ???Filesystem Hierarchy Standard compliance??? PEP? :-)
This has nothing to do with the FHS. The FHS talks about data, not code.
Really? It has some guidelines here for object files, etc., at least as of 2004. http://www.pathname.com/fhs/pub/fhs-2.3.html A quick scan suggests /usr/lib is the right place to look: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGAN... cheers, --titus -- C. Titus Brown, ctb@msu.edu
On 26.06.2010 22:30, C. Titus Brown wrote:
On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote:
On 25.06.2010 02:54, Ben Finney wrote:
James Y Knight<foom@fuhm.net> writes:
Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that.
+1
So who's going to draft the ???Filesystem Hierarchy Standard compliance??? PEP? :-)
This has nothing to do with the FHS. The FHS talks about data, not code.
Really? It has some guidelines here for object files, etc., at least as of 2004.
http://www.pathname.com/fhs/pub/fhs-2.3.html
A quick scan suggests /usr/lib is the right place to look:
http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGAN...
agreed for object files, but http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENT... explicitely states "The /usr/share hierarchy is for all read-only architecture independent *data* files".
On Jun 26, 2010, at 4:35 PM, Matthias Klose wrote:
On 26.06.2010 22:30, C. Titus Brown wrote:
On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote:
On 25.06.2010 02:54, Ben Finney wrote:
James Y Knight<foom@fuhm.net> writes:
Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that.
+1
So who's going to draft the ???Filesystem Hierarchy Standard compliance??? PEP? :-)
This has nothing to do with the FHS. The FHS talks about data, not code.
Really? It has some guidelines here for object files, etc., at least as of 2004.
http://www.pathname.com/fhs/pub/fhs-2.3.html
A quick scan suggests /usr/lib is the right place to look:
http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGAN...
agreed for object files, but http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENT... explicitely states "The /usr/share hierarchy is for all read-only architecture independent *data* files".
I always figured the "read-only architecture independent" bit was the important part there, and "code is data". Emacs's el files go into / usr/share/emacs, for instance. James
On 6/24/2010 8:23 PM, James Y Knight wrote:
On Jun 24, 2010, at 5:53 PM, Scott Dial wrote:
If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)
Because python looks for .so files in the same place it looks for the .py files of the same package.
My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk. Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet, you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme). Either the motivation for this PEP is inaccurate or I am failing to understand how this is *simpler*. In the case of pure-python, this PEP is clearly a win, but I have not seen an argument that it is a win for .so files. Moreover, the PEP itself is titled "PYC Repository Directories" (not "shared site-packages") and makes no mention of .so files at all. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On Fri, Jun 25, 2010 at 01:53, Scott Dial <scott+python-dev@scottdial.com> wrote:
On 6/24/2010 8:23 PM, James Y Knight wrote:
On Jun 24, 2010, at 5:53 PM, Scott Dial wrote:
If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)
Because python looks for .so files in the same place it looks for the .py files of the same package.
My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk.
I assume you are talking about PEP 3147. You're right that the PEP was for pyc files and that's it. No one is talking about rewriting the PEP. The motivation Barry is using is an overarching one of distros wanting to use a single directory install location for all installed Python versions. That led to PEP 3147 and now this work.
Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet, you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme).
This is meant to be used by distros in a programmatic fashion, so my response is "so what?" Their package management system is going to maintain the directory, not a person. You and I are not going to be using this for anything. This is purely meant for Linux OS vendors (maybe OS X) to manage their installs through their package software. I honestly do not expect human beings to be mucking around with these installs (and I suspect Barry doesn't either).
Either the motivation for this PEP is inaccurate or I am failing to understand how this is *simpler*. In the case of pure-python, this PEP is clearly a win, but I have not seen an argument that it is a win for .so files. Moreover, the PEP itself is titled "PYC Repository Directories" (not "shared site-packages") and makes no mention of .so files at all.
You're conflating what is being discussed with PEP 3147. That PEP is independent of this. PEP 3147 just empowered this work to be relevant. -Brett
-- Scott Dial scott@scottdial.com scodial@cs.indiana.edu _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On 6/25/2010 2:58 PM, Brett Cannon wrote:
I assume you are talking about PEP 3147. You're right that the PEP was for pyc files and that's it. No one is talking about rewriting the PEP.
Yes, I am making reference to PEP 3147. I make reference to that PEP because this change is of the same order of magnitude as the .pyc change, and we asked for a PEP for that, and if this .so stuff is an extension of that thought process, then it should either be reflected by that PEP or a new PEP.
The motivation Barry is using is an overarching one of distros wanting to use a single directory install location for all installed Python versions. That led to PEP 3147 and now this work.
It's unclear to me that that is the correct motivation, which you are divining. As I understand it, the motivation to be to *simplify installation* for distros, which may or may not be achieved by using a single directory. In the case of pure-python packages, a single directory is an obvious win. In the case of mixed-python packages, I remain to be persuaded there is any improvement achieved.
This is meant to be used by distros in a programmatic fashion, so my response is "so what?" Their package management system is going to maintain the directory, not a person.
Then why is the status quo unacceptable? I have already explained how this will still require programmatic steps of at least the same difficulty as the status quo requires, so why should we change anything? I am skeptical that this is a simple programmatic problem either: take any random package on PyPI and tell me whether or not it has a .so file that must be compiled. If such a .so file exists, then this package must be special-cased and compiled for each version of Python on the system (or will ever be on the system?). Such a package yields an arbitrary number of .so files due to the number of version of Python on the machine, and I can't imagine how it is simpler to manage all of those files than it is to manage multiple site-packages.
You're conflating what is being discussed with PEP 3147. That PEP is independent of this. PEP 3147 just empowered this work to be relevant.
Without a PEP (be it PEP 3147 or some other), what is the justification for doing this? The burden should be on "you" to explain why this is a good idea and not just a clever idea. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On Jun 25, 2010, at 03:42 PM, Scott Dial wrote:
On 6/25/2010 2:58 PM, Brett Cannon wrote:
I assume you are talking about PEP 3147. You're right that the PEP was for pyc files and that's it. No one is talking about rewriting the PEP.
Yes, I am making reference to PEP 3147. I make reference to that PEP because this change is of the same order of magnitude as the .pyc change, and we asked for a PEP for that, and if this .so stuff is an extension of that thought process, then it should either be reflected by that PEP or a new PEP.
I think it's not nearly on the order of magnitude as PEP 3147. One way to measure that is the size of the patch required to implement the feature and ensure the test suite still works. My versioned so patch is *way* smaller. I actually think because this is almost exclusively an extension to a build-time configuration option, and doesn't really change the language, a PEP shouldn't be necessary. But by the same token, I'm willing to write a new one (and *not* touch PEP 3147) just so that we have a point of reference to record the discussion and decision. So I'll do that. -Barry
On 25.06.2010 20:58, Brett Cannon wrote:
On Fri, Jun 25, 2010 at 01:53, Scott Dial
Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet, you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme).
This is meant to be used by distros in a programmatic fashion, so my response is "so what?" Their package management system is going to maintain the directory, not a person. You and I are not going to be using this for anything. This is purely meant for Linux OS vendors (maybe OS X) to manage their installs through their package software. I honestly do not expect human beings to be mucking around with these installs (and I suspect Barry doesn't either).
Placing files for a distribution in a version-independent path does help distributions handling file conflicts, detecting duplicates and with moving files between different (distribution) packages. Having non-conflicting extension names is a schema which already is used on some platforms (debug builds on Windows). The question for me is, if just a renaming of the .so files is acceptable for upstream, or if distributors should implement this on their own, as something like: if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'): load_ext('foo.2.6.so') else: load_ext('foo.so') I fear this will cause issues when e.g. virtualenv environments start copying parts from the system installation instead of symlinking it. Matthias
On Jun 26, 2010, at 10:45 PM, Matthias Klose wrote:
Having non-conflicting extension names is a schema which already is used on some platforms (debug builds on Windows). The question for me is, if just a renaming of the .so files is acceptable for upstream, or if distributors should implement this on their own, as something like:
if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'): load_ext('foo.2.6.so') else: load_ext('foo.so')
I fear this will cause issues when e.g. virtualenv environments start copying parts from the system installation instead of symlinking it.
I concur. I think my patch will have much less impact on virtualenv and similar tools because there's nothing much magical about it. It just says "oh there's another file suffix you should consider when looking for a shared library", which as you point out is already done on Windows. -Barry
On Jun 25, 2010, at 11:58 AM, Brett Cannon wrote:
Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet, you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme).
This is meant to be used by distros in a programmatic fashion, so my response is "so what?" Their package management system is going to maintain the directory, not a person. You and I are not going to be using this for anything. This is purely meant for Linux OS vendors (maybe OS X) to manage their installs through their package software. I honestly do not expect human beings to be mucking around with these installs (and I suspect Barry doesn't either).
Spot on. -Barry
On Jun 25, 2010, at 4:53 AM, Scott Dial wrote:
On 6/24/2010 8:23 PM, James Y Knight wrote:
On Jun 24, 2010, at 5:53 PM, Scott Dial wrote:
If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)
Because python looks for .so files in the same place it looks for the .py files of the same package.
My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk.
Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way.
This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory. However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions). In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package. The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier. James
On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight <foom@fuhm.net> wrote:
However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions).
In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package.
The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier.
I think there are two major ways to tackle this: - allow multiple versions of a .so file within a single directory (i.e Barry's current suggestion) - enhanced namespace packages, allowing a single package to be spread across multiple directories, some of which may be Python version specific (i.e. modifications to PEP 382 to support references to version-specific directories) I think a new PEP is definitely in order, especially to explain why enhancing PEP 382 to support saying "look over here for the .so files for this version" isn't a preferable approach. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 26.06.2010 02:19, Nick Coghlan wrote:
On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight<foom@fuhm.net> wrote:
However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions).
In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package.
The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier.
I think there are two major ways to tackle this: - allow multiple versions of a .so file within a single directory (i.e Barry's current suggestion)
we already do this, see the naming of the extensions of a python debug build on Windows. Several distributions (Debian, Fedora, Ubuntu) do use this as well to provide extensions for python debug builds.
- enhanced namespace packages, allowing a single package to be spread across multiple directories, some of which may be Python version specific (i.e. modifications to PEP 382 to support references to version-specific directories)
this is not what I want to use in a distribution. package management systems like rpm and dpkg do handle conflicts and replacements of files pretty well, having the same file in potentially different locations in the file system doesn't help detecting conflicts and duplicate packages. Matthias
On 25.06.2010 22:12, James Y Knight wrote:
On Jun 25, 2010, at 4:53 AM, Scott Dial wrote:
On 6/24/2010 8:23 PM, James Y Knight wrote:
On Jun 24, 2010, at 5:53 PM, Scott Dial wrote:
If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)
Because python looks for .so files in the same place it looks for the .py files of the same package.
My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk.
Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way.
This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory.
This is what is currently done. This will increase the size of packages by duplicating the .py files, or you have to install the .py in a common location (irrelevant to sys.path), and provide (sym)links to the expected location. A "different per-python-version directory" also has the disadvantage that file conflicts between (distribution) packages cannot be detected.
However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions).
In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package.
The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier.
I don't think that installation into different locations based on the presence of extension will work. Should a location really change if an extension is added as an optimization? Splitting a (python) package into different installation locations should be avoided. Matthias
On 6/26/2010 4:06 PM, Matthias Klose wrote:
On 25.06.2010 22:12, James Y Knight wrote:
On Jun 25, 2010, at 4:53 AM, Scott Dial wrote:
Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way.
This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory.
This is what is currently done. This will increase the size of packages by duplicating the .py files, or you have to install the .py in a common location (irrelevant to sys.path), and provide (sym)links to the expected location.
"This is what is currently done" and "provide (sym)links to the expected location" are conflicting statements. If you are symlinking .py files from a shared location, then that is not the same as "just install the package into a version-specific location". What motivation is there for preferring symlinks? Who cares if a ditro package install yields duplicate .py files? Nor am I motivated by having to carry duplicate .py files in a distribution package (I imagine the compression of duplicate .py files is amazing).
A "different per-python-version directory" also has the disadvantage that file conflicts between (distribution) packages cannot be detected.
Why? That sounds like a broken tool, maybe I am naive, please explain. If two packages install /usr/lib/python2.6/foo.so that should be just as detectable two installing /usr/lib/python-shared/foo.cpython-26.so If you *must* compile .so files for every supported version of python at packaging time, then you are already saying the set of python versions is known. I fail to see the difference between a package that installs .py and .so files into many directories than having many .so files in a single directory; except that many directories *already* works. The only gain I can see is that you save duplicate .py files in the package and on the filesystem, and I don't feel that gain alone warrants this fundamental change. I would appreciate a proper explanation of why/how a single directory is better for your distribution. Also, I haven't heard anyone that wasn't using debian tools chime in with support for any of this, so I would like to know how this can help RPMs and ebuilds and the like.
I don't think that installation into different locations based on the presence of extension will work. Should a location really change if an extension is added as an optimization? Splitting a (python) package into different installation locations should be avoided.
I'm not sure why changing paths would matter; any package that writes data in its install location would be considered broken by your distro already, so what harm is there in having the packaging tool move it later? Your tool will remove the old path and place it in a new path. All of these shenanigans seem to manifest from your distro's python-support/-central design, which seems to be entirely motivated by reducing duplicate files and *not* simplifying the packaging. While this plan works rather well with .py files, the devil is in the details. I don't think Python should be getting involved in what I believe is a flawed design. What happens to the distro packaging if a python package splits the codebase between 2.x and 3.x (meaning they have distinct .py files)? As someone else mentioned, how is virtualenv going to interact with packages that install like this? -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On Jun 26, 2010, at 06:50 PM, Scott Dial wrote:
On 6/26/2010 4:06 PM, Matthias Klose wrote:
On 25.06.2010 22:12, James Y Knight wrote:
On Jun 25, 2010, at 4:53 AM, Scott Dial wrote:
Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way.
This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory.
This is what is currently done. This will increase the size of packages by duplicating the .py files, or you have to install the .py in a common location (irrelevant to sys.path), and provide (sym)links to the expected location.
"This is what is currently done" and "provide (sym)links to the expected location" are conflicting statements.
I think Matthias was referring to "what is currently done" to your statement "debian needs to build it once per version". Providing symlinks is how we are able to make it appear that there are version-specific py files without actually doing so.
If you are symlinking .py files from a shared location, then that is not the same as "just install the package into a version-specific location". What motivation is there for preferring symlinks?
This reduces .py file duplications in distro packages.
Who cares if a ditro package install yields duplicate .py files? Nor am I motivated by having to carry duplicate .py files in a distribution package (I imagine the compression of duplicate .py files is amazing).
It might be amazing, but it's still a significant overhead. As I've described, multiply that by all the py files in all the distro packages containing Python source code, and then still try to fit it on a CDROM.
What happens to the distro packaging if a python package splits the codebase between 2.x and 3.x (meaning they have distinct .py files)?
The Debian/Ubuntu approach to Python 2/3 support is to provide them in separate distro packages. E.g. for Python package foo, you would have Debuntu package python-foo (for the Python 2.x version) and python3-foo. We do not share source between Python 2 and 3 versions, at least not yet <wink>. This doesn't hurt us much because the number of Python packages that are source compatible between the two is pretty low (Benjamin's 'six' package might change that :), and not much depends on Python 3 yet.
As someone else mentioned, how is virtualenv going to interact with packages that install like this?
This is a good question, but I *think* it won't affect it much at all. To test for sure I'd either need a Python 3 compatible virtualenv or backport my patch to Python 2.6 and 2.7. But still, I'm not sure it would matter since the same shared library import suffix is used in either case. I actually think version-specific search paths would have a greater impact on virtualenv. -Barry
On 6/30/2010 2:53 PM, Barry Warsaw wrote:
It might be amazing, but it's still a significant overhead. As I've described, multiply that by all the py files in all the distro packages containing Python source code, and then still try to fit it on a CDROM.
I decided to prove to myself that it was not a significant issue to have parallel directory structures in a .tar.bz2, and I was surprised to find it much worse at that then I had imagined. For example, # cd /usr/lib/python2.6/site-packages # tar --exclude="*.pyc" --exclude="*.pyo" \ -cjf mercurial.tar.bz2 mercurial # du -h mercurial.tar.bz2 640K mercurial.tar.bz2 # cp -a mercurial mercurial2 # tar --exclude="*.pyc" --exclude="*.pyo" \ -cjf mercurial2.tar.bz2 mercurial mercurial2 # du -h mercurial.tar.bz2 1.3M mercurial2.tar.bz2 So, I was definitely wrong in saying that you do better than doubling.
What happens to the distro packaging if a python package splits the codebase between 2.x and 3.x (meaning they have distinct .py files)?
The Debian/Ubuntu approach to Python 2/3 support is to provide them in separate distro packages. E.g. for Python package foo, you would have Debuntu package python-foo (for the Python 2.x version) and python3-foo. We do not share source between Python 2 and 3 versions, at least not yet <wink>.
I had asked this question to point out that you will still need to accommodate some form of version-specific packages (I am not a debuntu expert by any means). And, I think your response is an acknowledgment of that fact, however it's certainly true that there are few examples, as you said. I appreciate all your replies. I am not sure a PEP is really needed here, but to having had all of this discussed and explained on the mailing list is certainly useful. I trust that yourself and the debuntu python group will end up chasing down and taking care of any quirks that this change might cause, so I am not worried about it. :D -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On Jul 01, 2010, at 07:02 AM, Scott Dial wrote:
I decided to prove to myself that it was not a significant issue to have parallel directory structures in a .tar.bz2, and I was surprised to find it much worse at that then I had imagined. For example,
# cd /usr/lib/python2.6/site-packages # tar --exclude="*.pyc" --exclude="*.pyo" \ -cjf mercurial.tar.bz2 mercurial # du -h mercurial.tar.bz2 640K mercurial.tar.bz2
# cp -a mercurial mercurial2 # tar --exclude="*.pyc" --exclude="*.pyo" \ -cjf mercurial2.tar.bz2 mercurial mercurial2 # du -h mercurial.tar.bz2 1.3M mercurial2.tar.bz2
So, I was definitely wrong in saying that you do better than doubling. [...] I appreciate all your replies. I am not sure a PEP is really needed here, but to having had all of this discussed and explained on the mailing list is certainly useful. I trust that yourself and the debuntu python group will end up chasing down and taking care of any quirks that this change might cause, so I am not worried about it. :D
Getting back to this after the US holiday. Thanks for running these numbers Scott. I've opened a bug in the Python tracker and attached my latest patch: http://bugs.python.org/issue9193 The one difference from previous versions of the patch is that the .so tag is now settable via "./configure --with-so-abi-tag=foo". This would generate shared libs like _multiprocessing.foo.so. I'd like to get consensus as to whether folks feel that a PEP is needed. My own thought is that I'd rather not do a PEP specific to this change, but I would update PEP 384 with the implications on .so versioning. Please also feel free to review the patch in that issue. Thanks, -Barry
Am 07.07.2010 20:40, schrieb Barry Warsaw:
Getting back to this after the US holiday. Thanks for running these numbers Scott. I've opened a bug in the Python tracker and attached my latest patch:
http://bugs.python.org/issue9193
The one difference from previous versions of the patch is that the .so tag is now settable via "./configure --with-so-abi-tag=foo". This would generate shared libs like _multiprocessing.foo.so.
I'd like to get consensus as to whether folks feel that a PEP is needed. My own thought is that I'd rather not do a PEP specific to this change, but I would update PEP 384 with the implications on .so versioning. Please also feel free to review the patch in that issue.
I can see where this is going... writing it into PEP 384 would automatically get the change accepted?
Am 07.07.2010 23:04, schrieb Georg Brandl:
Am 07.07.2010 20:40, schrieb Barry Warsaw:
Getting back to this after the US holiday. Thanks for running these numbers Scott. I've opened a bug in the Python tracker and attached my latest patch:
http://bugs.python.org/issue9193
The one difference from previous versions of the patch is that the .so tag is now settable via "./configure --with-so-abi-tag=foo". This would generate shared libs like _multiprocessing.foo.so.
I'd like to get consensus as to whether folks feel that a PEP is needed. My own thought is that I'd rather not do a PEP specific to this change, but I would update PEP 384 with the implications on .so versioning. Please also feel free to review the patch in that issue.
I can see where this is going... writing it into PEP 384 would automatically get the change accepted?
I hit "Send" prematurely. I wanted to add that I'd be okay with this change, be it in a new PEP or an old one. Georg
On Jul 08, 2010, at 09:14 AM, Georg Brandl wrote:
Am 07.07.2010 23:04, schrieb Georg Brandl:
I can see where this is going... writing it into PEP 384 would automatically get the change accepted?
I'm definitely not trying to get it in subversively. :)
I hit "Send" prematurely. I wanted to add that I'd be okay with this change, be it in a new PEP or an old one.
Cool. I'll take Nick up on the suggestion to summarize the thread via a new PEP. -Barry
On Thu, Jul 8, 2010 at 4:40 AM, Barry Warsaw <barry@python.org> wrote:
I'd like to get consensus as to whether folks feel that a PEP is needed. My own thought is that I'd rather not do a PEP specific to this change, but I would update PEP 384 with the implications on .so versioning. Please also feel free to review the patch in that issue.
I suspect you could write a new PEP faster than you could convince those suggesting the change needs a PEP (including me) that one isn't necessary. Presumably you were going to do a summary email for the mailing list anyway - just tidy up the formatting a bit and check it in as a PEP instead :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 07.07.2010 20:40, Barry Warsaw wrote:
Getting back to this after the US holiday. Thanks for running these numbers Scott. I've opened a bug in the Python tracker and attached my latest patch:
http://bugs.python.org/issue9193
The one difference from previous versions of the patch is that the .so tag is now settable via "./configure --with-so-abi-tag=foo". This would generate shared libs like _multiprocessing.foo.so.
- imo, it's wrong to lookup _multiprocessing.so first, before looking up _multiprocessing.foo.so (at least for the use case to put the extensions for multiple python versions into one directory). - why is the flexibility of specifying the "foo" needed? The naming for the __pycache__ files is fixed, why have it configurable for extensions? Matthias
On Jul 08, 2010, at 01:47 AM, Matthias Klose wrote:
On 07.07.2010 20:40, Barry Warsaw wrote:
Getting back to this after the US holiday. Thanks for running these numbers Scott. I've opened a bug in the Python tracker and attached my latest patch:
http://bugs.python.org/issue9193
The one difference from previous versions of the patch is that the .so tag is now settable via "./configure --with-so-abi-tag=foo". This would generate shared libs like _multiprocessing.foo.so.
- imo, it's wrong to lookup _multiprocessing.so first, before looking up _multiprocessing.foo.so (at least for the use case to put the extensions for multiple python versions into one directory).
Good point.
- why is the flexibility of specifying the "foo" needed? The naming for the __pycache__ files is fixed, why have it configurable for extensions?
The 'foo' part in the shared library name is equivalent to the <tag> part in __pycache__/baz.<tag>.pyc, not specifically the __pycache__ part. Specifying the <tag> is necessary because extension modules built for Python 3.2 will not be compatible with extension modules built for Python 3.3 (in the absence of PEP 384), but they will live in the same directory. -Barry
Scott Dial wrote:
On 6/30/2010 2:53 PM, Barry Warsaw wrote:
It might be amazing, but it's still a significant overhead. As I've described, multiply that by all the py files in all the distro packages containing Python source code, and then still try to fit it on a CDROM.
I decided to prove to myself that it was not a significant issue to have parallel directory structures in a .tar.bz2, and I was surprised to find it much worse at that then I had imagined. For example,
# cd /usr/lib/python2.6/site-packages # tar --exclude="*.pyc" --exclude="*.pyo" \ -cjf mercurial.tar.bz2 mercurial # du -h mercurial.tar.bz2 640K mercurial.tar.bz2
# cp -a mercurial mercurial2 # tar --exclude="*.pyc" --exclude="*.pyo" \ -cjf mercurial2.tar.bz2 mercurial mercurial2 # du -h mercurial.tar.bz2 1.3M mercurial2.tar.bz2
I believe the standard (and largest) block size for .bz2 is 900kB, and I *think* that is uncompressed. Though I know that bz2 can chain, since it can compress all NULL bytes extremely well (multiple GB down to kB, IIRC). There was a question as to whether LZMA would do better here, I'm using 7zip, but .xz should perform similarly. $ du -sh mercurial* 2.6M mercurial 2.6M mercurial2 366K mercurial.tar.bz2 734K mercurial2.tar.bz2 303K mercurial.7z 310K mercurial2.7z So LZMA with the 'normal' compression has a big enough window to find almost all of the redundancy, and 310kB is certainly a very small increase over the 303kB. And clearly bz2 does not, since 734kB is actually slightly more than 2x 366kB. John =:->
On Jun 25, 2010, at 04:53 AM, Scott Dial wrote:
My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path).
Matthias outlined some of the pitfalls with this approach.
The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk.
As others have pointed out, versioned so files is not part of PEP 3147. That PEP does reduce the number of py files on disk, which as I explained in a previous follow, is an important consideration.
Placing .so files together does not simplify that install process in any way.
I disagree of course. :)
You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet,
No, distutils cannot take care of this. There is no way currently to tell distutils to generate a .so file with anything but the platform-specific way of spelling "shared library".
you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme).
That's not a use case we care about. If you have Python 3.2 and 3.3 installed on your system, why would you want lxml installed for one but not the other? And even if for some reason you did, the only way to do that would be in a way similar to handling the PEP 3147 pyc files. -Barry
Scott Dial wrote:
But the only motivation for doing this with .pyc files is that the .py files are able to be shared,
In an application made up of a mixture of pure Python and extension modules, the .py files are able to be shared too. Seems to me that a similar motivation exists here as well. Not exactly the same, but closely related. -- Greg
On 6/24/2010 9:18 PM, Greg Ewing wrote:
Scott Dial wrote:
But the only motivation for doing this with .pyc files is that the .py files are able to be shared,
In an application made up of a mixture of pure Python and extension modules, the .py files are able to be shared too. Seems to me that a similar motivation exists here as well. Not exactly the same, but closely related.
If I recall Barry's motivation correctly, the PEP was intended to simplify the installation of packages for multiple versions of Python, although the PEP states that in a less direct way. In the case of pure-python packages, this is merely about avoiding .pyc collisions. But, in the case of packages with .so files, I fail to see how this is simpler (in face, I believe it to be more complicated). So, I am not sure the PEP supports this feature being proposed (since it makes no mention of .so files), and more importantly, I am not sure it actually makes anything better for anyone (still requires multiple compilations and un/install gymnastics). -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
I'm trying to catch up on this thread, so I may collapse some responses or refer to points others have brought up. On Jun 24, 2010, at 05:53 PM, Scott Dial wrote:
If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)?
I think Matthias has described the motivation for the Debian/Ubuntu case, and James describes Python's current search algorithm for a packages .py[c] and .so files. There are a few points that you've made that I want to respond to. You claim that versioned .so files scheme is "more complicated" than multiple version-specific search paths (if I understand your counter proposal correctly). It all depends on your point of view. From mine, a 100 line patch that almost nobody but (some) distros will care about or be affected by, and that only changes a fairly obscure build-time configuration, is much simpler than trying to make version-specific search paths work. If you build Python from source, you do not care about this patch and you'll never see its effects. If you get Python on a distribution that only gives you one version of Python at a time, you also will probably never care or see the effects of this patch. If you're a Debian or Ubuntu user who wants to use Python 3.2 and 3.3, you *might* care about it, but most likely it'll just work behind the scenes. If you're a Python packager or work on the Python infrastructure for one of those platforms, then you will care. About just sharing the py files. You say that would be acceptable to you, but it's actually a pretty big deal. If you're supporting two versions of Python, then every distro Python package doubles in size. Even with compression, you're talking longer download times and probably more critically, you've greatly increased CDROM space pressures. The Ubuntu CDROM is already essentially at capacity so doubling the size of all Python packages (most of which btw do not have extension modules) makes such an approach impossible. Moving to a DVD image has been discussed, but it is currently believed not in the best interest of users, especially on slow links, to do so at this time. The versioned .so approach will of course increase the size of packages by twice the contained .so file size, and that's already an uncomfortable but acceptable increase. It's acceptable because of the gain users get by having multiple versions of Python available and the fact that there aren't nearly as many extension modules as there are Python files. Doubling the size of .py files as well isn't acceptable.
But the only motivation for doing this with .pyc files is that the .py files are able to be shared, since the .pyc is an on-demand-generated, version-specific artifact (and not the source). The .so file is created offline by another toolchain, is version-specific, and presumably you are not suggesting that Python generate it on-demand.
Definitely not. pyc files are generated upon installation of the distro package, but of course the .so files must be compiled on a build machine and included in the distro package. The whole process is much simpler if the versioned .so files can just live in the same directory.
For packages that have .so files, won't the distro already have to build multiple copies of that package for all version of Python? So, why can't it place them in separate directories that are version-specific at that time? This is not the same as placing .py files that are version-agnostic into a version-agnostic location.
It's not a matter of "could", it's a matter of simplicity, and I think versioned .so files are the simplest solution given all the constraints. -Barry
On Thu, Jun 24, 2010 at 08:50, Barry Warsaw <barry@python.org> wrote:
This is a follow up to PEP 3147. That PEP, already implemented in Python 3.2, allows for Python source files from different Python versions to live together in the same directory. It does this by putting a magic tag in the .pyc file name and placing the .pyc file in a __pycache__ directory.
Distros such as Debian and Ubuntu will use this to greatly simplifying deploying Python, and Python applications and libraries. Debian and Ubuntu usually ship more than one version of Python, and currently have to play complex games with symlinks to make this work. PEP 3147 will go a long way to eliminating the need for extra directories and symlinks.
One more thing I've found we need though, is a way to handled shared libraries for extension modules. Just as we can get name collisions on foo.pyc, we can get collisions on foo.so. We obviously cannot install foo.so built for Python 3.2 and foo.so built for Python 3.3 in the same location. So symlink nightmare's mini-me is back.
I have a fairly simple fix for this. I'd actually be surprised if this hasn't been discussed before, but teh Googles hasn't turned up anything.
The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility).
(On file naming: the original patch used foo.so.3.2 and that works just as well, but I thought there might be tools that expect exactly a '.so' suffix, so I changed it to put the Major.Minor version number to the left of the extension. The exact naming scheme is of course open to debate.)
While the idea is fine with me since I won't have any of my directories cluttered with multiple .so files, I would still want to add some moniker showing that the version number represents the interpreter and not the .so file. If I read "foo.3.2.so", that naively seems to mean to mean the foo module's 3.2 release is what is in installed, not that it's built for CPython 3.2. So even though it might be redundant, I would still want the VM name added. Adding the VM name also doesn't make extension modules the exclusive domain of CPython either. If some other VM decides to make their own .so files that are not binary compatible then we should not preclude that as this solution it is nothing more than it makes a string comparison have to look at 7 more characters. -Brett P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much.
On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon <brett@python.org> wrote:
On Thu, Jun 24, 2010 at 08:50, Barry Warsaw <barry@python.org> wrote:
This is a follow up to PEP 3147. That PEP, already implemented in Python 3.2, allows for Python source files from different Python versions to live together in the same directory. It does this by putting a magic tag in the .pyc file name and placing the .pyc file in a __pycache__ directory.
Distros such as Debian and Ubuntu will use this to greatly simplifying deploying Python, and Python applications and libraries. Debian and Ubuntu usually ship more than one version of Python, and currently have to play complex games with symlinks to make this work. PEP 3147 will go a long way to eliminating the need for extra directories and symlinks.
One more thing I've found we need though, is a way to handled shared libraries for extension modules. Just as we can get name collisions on foo.pyc, we can get collisions on foo.so. We obviously cannot install foo.so built for Python 3.2 and foo.so built for Python 3.3 in the same location. So symlink nightmare's mini-me is back.
I have a fairly simple fix for this. I'd actually be surprised if this hasn't been discussed before, but teh Googles hasn't turned up anything.
The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility).
(On file naming: the original patch used foo.so.3.2 and that works just as well, but I thought there might be tools that expect exactly a '.so' suffix, so I changed it to put the Major.Minor version number to the left of the extension. The exact naming scheme is of course open to debate.)
While the idea is fine with me since I won't have any of my directories cluttered with multiple .so files, I would still want to add some moniker showing that the version number represents the interpreter and not the .so file. If I read "foo.3.2.so", that naively seems to mean to mean the foo module's 3.2 release is what is in installed, not that it's built for CPython 3.2. So even though it might be redundant, I would still want the VM name added.
Well, for versions of the .so itself, traditionally version numbers are appended *after* the .so suffix (check your /lib directory :-).
Adding the VM name also doesn't make extension modules the exclusive domain of CPython either. If some other VM decides to make their own .so files that are not binary compatible then we should not preclude that as this solution it is nothing more than it makes a string comparison have to look at 7 more characters.
-Brett
P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much.
I wish so too. IIRC there used to be some modules that on Windows were wrappers around 3rd party DLLs and you can't have foo.dll as the module wrapping foo.dll the 3rd party DLL. (On Unix this problem doesn't exist because the 3rd party .so would be named libfoo.so, not foo.so.) -- --Guido van Rossum (python.org/~guido)
On Thu, Jun 24, 2010 at 11:27, Guido van Rossum <guido@python.org> wrote:
On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon <brett@python.org> wrote:
On Thu, Jun 24, 2010 at 08:50, Barry Warsaw <barry@python.org> wrote:
This is a follow up to PEP 3147. That PEP, already implemented in Python 3.2, allows for Python source files from different Python versions to live together in the same directory. It does this by putting a magic tag in the .pyc file name and placing the .pyc file in a __pycache__ directory.
Distros such as Debian and Ubuntu will use this to greatly simplifying deploying Python, and Python applications and libraries. Debian and Ubuntu usually ship more than one version of Python, and currently have to play complex games with symlinks to make this work. PEP 3147 will go a long way to eliminating the need for extra directories and symlinks.
One more thing I've found we need though, is a way to handled shared libraries for extension modules. Just as we can get name collisions on foo.pyc, we can get collisions on foo.so. We obviously cannot install foo.so built for Python 3.2 and foo.so built for Python 3.3 in the same location. So symlink nightmare's mini-me is back.
I have a fairly simple fix for this. I'd actually be surprised if this hasn't been discussed before, but teh Googles hasn't turned up anything.
The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility).
(On file naming: the original patch used foo.so.3.2 and that works just as well, but I thought there might be tools that expect exactly a '.so' suffix, so I changed it to put the Major.Minor version number to the left of the extension. The exact naming scheme is of course open to debate.)
While the idea is fine with me since I won't have any of my directories cluttered with multiple .so files, I would still want to add some moniker showing that the version number represents the interpreter and not the .so file. If I read "foo.3.2.so", that naively seems to mean to mean the foo module's 3.2 release is what is in installed, not that it's built for CPython 3.2. So even though it might be redundant, I would still want the VM name added.
Well, for versions of the .so itself, traditionally version numbers are appended *after* the .so suffix (check your /lib directory :-).
Second thing you taught me today (first was the x[:0] trick)! I've also been on OS X too long; /usr/lib is just .dynalib and that puts the version number before the extension.
Adding the VM name also doesn't make extension modules the exclusive domain of CPython either. If some other VM decides to make their own .so files that are not binary compatible then we should not preclude that as this solution it is nothing more than it makes a string comparison have to look at 7 more characters.
-Brett
P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much.
I wish so too. IIRC there used to be some modules that on Windows were wrappers around 3rd party DLLs and you can't have foo.dll as the module wrapping foo.dll the 3rd party DLL. (On Unix this problem doesn't exist because the 3rd party .so would be named libfoo.so, not foo.so.)
Wouldn't Barry's proposed solution actually fill this need since it will give the file a custom Python suffix that more-or-less guarantees no name clash with a third-party DLL?
On Jun 24, 2010, at 11:27 AM, Guido van Rossum wrote:
On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon <brett@python.org> wrote:
While the idea is fine with me since I won't have any of my directories cluttered with multiple .so files, I would still want to add some moniker showing that the version number represents the interpreter and not the .so file. If I read "foo.3.2.so", that naively seems to mean to mean the foo module's 3.2 release is what is in installed, not that it's built for CPython 3.2. So even though it might be redundant, I would still want the VM name added.
Well, for versions of the .so itself, traditionally version numbers are appended *after* the .so suffix (check your /lib directory :-).
Which is probably another reason not to use foo.so.X.Y for Python extension modules. I think it would be confusing, and foo.<tag>.so looks nice and is consistent with foo.<tag>.pyc. (Ref to updated patch coming...) -Barry
On Thu, Jun 24, 2010 at 4:55 PM, Barry Warsaw <barry@python.org> wrote:
Which is probably another reason not to use foo.so.X.Y for Python extension modules.
Clearly, foo.so.3.2 is the man page for the foo.so.3 system call. The ABI ident definitely has to be elsewhere. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "A storm broke loose in my mind." --Albert Einstein
Le 24/06/2010 19:48, Brett Cannon a écrit :
P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much.
At least, looking for spam/__init__module.so could be avoided. It seems to me that the package definition does not allow that. The tradeoff would be code complication for one less stat call. Worth a bug report? Regards
On Thu, Jun 24, 2010 at 11:53, Éric Araujo <merwok@netwok.org> wrote:
Le 24/06/2010 19:48, Brett Cannon a écrit :
P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much.
At least, looking for spam/__init__module.so could be avoided. It seems to me that the package definition does not allow that.
I thought no one had bothered to change import.c to allow for extension modules to act as a package's __init__? As for not being allowed, I don't agree with that assessment. If you treat a package's __init__ module as simply that, a module that would be named __init__ when imported, then __init__module.c would be valid (and that's what importlib does).
The tradeoff would be code complication for one less stat call. Worth a bug report?
Nah.
On Jun 24, 2010, at 10:48 AM, Brett Cannon wrote:
While the idea is fine with me since I won't have any of my directories cluttered with multiple .so files, I would still want to add some moniker showing that the version number represents the interpreter and not the .so file. If I read "foo.3.2.so", that naively seems to mean to mean the foo module's 3.2 release is what is in installed, not that it's built for CPython 3.2. So even though it might be redundant, I would still want the VM name added.
I have a new version of my patch that steals the "magic tag" idea from PEP 3147. Note that it does not use the *actual* same piece of information to compose the file name, but for now it does match the pyc tag string. E.g. % find . -name \*.so ./build/lib.linux-x86_64-3.2/math.cpython-32.so ./build/lib.linux-x86_64-3.2/select.cpython-32.so ./build/lib.linux-x86_64-3.2/_struct.cpython-32.so ... Further, by default, ./configure doesn't add this tag so that you would have to build Python with: % SOABI=cpython-32 ./configure to get anything between the module name and the extension. I could of course make this a configure switch instead, and could default it to some other magic string instead of the empty string.
Adding the VM name also doesn't make extension modules the exclusive domain of CPython either. If some other VM decides to make their own .so files that are not binary compatible then we should not preclude that as this solution it is nothing more than it makes a string comparison have to look at 7 more characters.
-Brett
P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much.
I think you're right that with the $SOABI trick above, you wouldn't get the name collisions Guido recalls, and you could get rid of module.so. OTOH, as I am currently only targeting Linux, it seems like the module.so stat is wasted anyway on that platform. -Barry
Le 24/06/2010 17:50, Barry Warsaw (FLUFL) a écrit :
Other possible approaches: * Extend the distutils API so that the .so file extension can be passed in, instead of being essentially hardcoded to what Python's Makefile contains.
Third-party code rely on Distutils internal quirks, so it’s frozen. Feel free to open a bug against Distutils2 on the Python tracker if that would be generally useful. Regards
On Jun 24, 2010, at 08:50 PM, Éric Araujo wrote:
Le 24/06/2010 17:50, Barry Warsaw (FLUFL) a écrit :
Other possible approaches: * Extend the distutils API so that the .so file extension can be passed in, instead of being essentially hardcoded to what Python's Makefile contains.
Third-party code rely on Distutils internal quirks, so it’s frozen. Feel free to open a bug against Distutils2 on the Python tracker if that would be generally useful.
Depending on how strict this constraint is, it could make things more difficult. I can control what shared library file names Python will load statically, but in order to support PEP 384 I think I need to be able to control what file extensions build_ext writes. My updated patch does this in a backward compatible way. Of course, distutils hacks have their tentacles all up in the distutils internals, so maybe my patch will break something after all. I can think of a few even hackier ways to work around that if necessary. My updated patch: * Adds an optional argument to build_ext.get_ext_fullpath() and build_ext.get_ext_filename(). This extra argument is the Extension instance being built. (Boy, just in case anyone's already playing with the time machine, it sure would have been nice if these methods had originally just taken the Extension instance and dug out ext.name instead of passing the string in.) * Adds an optional new keyword argument to the Extension class, called so_abi_tag. If given, this overrides the Makefile $SO variable extension. What this means is that with no changes, a non-PEP 384 compliant extension module wouldn't have to change anything: setup( name='stupid', version='0.0', packages=['stupid', 'stupid.tests'], ext_modules=[Extension('_stupid', ['src/stupid.c'], )], test_suite='stupid.tests', ) With a Python built like so: % SOABI=cpython-32 ./configure you'd end up with a _stupid.cpython-32.so module. However, if you knew your extension module was PEP 384 compliant, and could be shared on >=Python 3.2, you would do: setup( name='stupid', version='0.0', packages=['stupid', 'stupid.tests'], ext_modules=[Extension('_stupid', ['src/stupid.c'], so_abi_tag='', )], test_suite='stupid.tests', ) and now you'd end up with _stupid.so, which I propose to mean it's PEP 384 ABI compliant. (There may not be any other use case than so_abi_tag='' or so_abi_tag=None, in which case, the Extension keyword *might* be better off as a boolean.) Now of course PEP 384 isn't implemented, so it's a bit of a moot point. But if some form of versioned .so file naming is accepted for Python 3.2, I'll update PEP 384 with possible solutions. -Barry
Your plan seems good. Adding keyword arguments should not create compatibility issues, and I suspect the impact on the code of build_ext may be actually quite small. I’ll try to review your patch even though I don’t know C or compiler oddities, but Tarek will have the best insight and the final word. In case the time machine’s not available, your suggestion about getting the filename from the Extension instance instead of passing in a string can most certainly land in distutils2. Regards
On Jun 24, 2010, at 11:37 PM, Éric Araujo wrote:
Your plan seems good. Adding keyword arguments should not create compatibility issues, and I suspect the impact on the code of build_ext may be actually quite small. I’ll try to review your patch even though I don’t know C or compiler oddities, but Tarek will have the best insight and the final word.
The C and configure/Makefile bits are pretty trivial. It basically extends the list of shared library extensions searched for on *nix machines, and allows that to be set on the ./configure command. As for the impact on distutils, with updated tests, it's less than 100 lines of diff. Again there it essentially allows us to pass the extension that build_ext writes to from the setup.py, via the Extension class. Because distutil's default is to use the $SO variable from the system-installed Makefile, with the change to dynload_shlib.c, configure.in, and Makefile.pre.in, we would get distutils writing the versioned .so files for free. I'll note further that if you *don't* specify this to ./configure, nothing much changes[1]. The distutils part of the patch is only there to disable or override the default, and *that's* only there to support proposed semantics that foo.so be used for PEP 384-compliant ABI extension modules. IOW, until PEP 384 is actually implemented, the distutils part of the patch is unnecessary. However, if the other changes are accepted, then I will add a discussion of this issue to PEP 384, and we can figure out the best semantics and implementation at that point. I honestly don't know if I am going to get to work on PEP 384 before 3.2 beta.
In case the time machine’s not available, your suggestion about getting the filename from the Extension instance instead of passing in a string can most certainly land in distutils2.
Cool. -Barry [1] Well, I now realize you'll get an extra useless stat call, but I will fix that.
On Jun 24, 2010, at 11:50 AM, Barry Warsaw wrote:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
Thanks for all the quick feedback. I've made some changes based on the comments so far. The bzr branch is updated, and a new patch is available here: http://pastebin.ubuntu.com/454688/ If reception continues to be mildly approving, I'll open an issue on bugs.python.org and attach the patch to that. -Barry
On Fri, Jun 25, 2010 at 1:50 AM, Barry Warsaw <barry@python.org> wrote:
Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections <wink>. I don't think a new PEP is in order, but an update to PEP 3147 might make sense.
I like the idea, but I think summarising the rest of this discussion in its own (relatively short) PEP would be good (there are a few things that are tricky - exact versioning scheme, PEP 384 forward compatibility, impact on distutils, articulating the benefits for distro packaging, etc). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jun 25, 2010, at 08:35 AM, Nick Coghlan wrote:
I like the idea, but I think summarising the rest of this discussion in its own (relatively short) PEP would be good (there are a few things that are tricky - exact versioning scheme, PEP 384 forward compatibility, impact on distutils, articulating the benefits for distro packaging, etc).
The first draft of PEP 3149 is ready for review. http://www.python.org/dev/peps/pep-3149/ Plain text attached here for your convenience. Comments, suggestions as always are welcome. Thanks to everyone who participated in the original discussion. -Barry PEP: 3149 Title: ABI version tagged .so files Version: $Revision: 81577 $ Last-Modified: $Date: 2010-05-27 19:54:25 -0400 (Thu, 27 May 2010) $ Author: Barry Warsaw <barry@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2010-07-09 Python-Version: 3.2 Post-History: 2010-07-14 Resolution: TBD Abstract ======== PEP 3147 [1]_ described an extension to Python's import machinery that improved the sharing of Python source code, by allowing more than one byte compilation file (.pyc) to be co-located with each source file. This PEP defines an adjunct feature which allows the co-location of extension module files (.so) in a similar manner. This optional, build-time feature will enable downstream distributions of Python to more easily provide more than one Python major version at a time. Background ========== PEP 3147 defined the file system layout for a pure-Python package, where multiple versions of Python are available on the system. For example, where the `alpha` package containing source modules `one.py` and `two.py` exist on a system with Python 3.2 and 3.3, the post-byte compilation file system layout would be:: alpha/ __pycache__/ __init__.cpython-32.pyc __init__.cpython-33.pyc one.cpython-32.pyc one.cpython-33.pyc two.cpython-32.pyc two.cpython-33.pyc __init__.py one.py two.py For packages with extension modules, a similar differentiation is needed for the module's .so files. Extension modules compiled for different Python major versions are incompatible with each other due to changes in the ABI. While PEP 384 [2]_ defines a stable ABI, it will minimize, but not eliminate extension module incompatibilities between Python major versions. Thus a mechanism for discriminating extension module file names is proposed. Rationale ========= Linux distributions such as Ubuntu [3]_ and Debian [4]_ provide more than one Python version at the same time to their users. For example, Ubuntu 9.10 Karmic Koala users can install Python 2.5, 2.6, and 3.1, with Python 2.6 being the default. In order to share as much as possible between the available Python versions, these distributions install third party (i.e. non-standard library) packages into `/usr/share/pyshared` and symlink to them from `/usr/lib/pythonX.Y/dist-packages`. The symlinks exist because in a pre-PEP 3147 world (i.e < Python 3.2), the `.pyc` files resulting from byte compilation by the various installed Pythons will name collide with each other. For Python versions >= 3.2, all pure-Python packages can be shared, because the `.pyc` files will no longer cause file system naming conflicts. Eliminating these symlinks makes for a simpler, more robust Python distribution. A similar situation arises with shared library extensions. Because extension modules are typically named `foo.so` for a `foo` extension module, these would also name collide if `foo` was provided for more than one Python version. There are several approaches that could be taken to avoid this, which will be explored below, but this PEP proposes a fairly simple compile-time option to allow extension modules to live in the same file system directory and avoid any name collisions. Proposal ======== A new configure option is added for building Python, called `--with-so-abi-tag`. This takes as an argument a unique, but arbitrary string, e.g.:: ./configure --with-so-abi-tag=cpython-32 This string is passed into the `Makefile` and affects two aspects of the Python build. First, it is compiled into `Python/dynload_shlib.c` where it defines some additional `.so` file names to search for when importing extension modules. Second, it modifies the `Makefile`'s `$SO` variable, which in turn controls the `distutils` module's default filename when compiling extension modules. When `--with-so-abi-tag` is not given to `configure` nothing changes in the way the Python executable is built, or acts. Thus, this configure switch is completely optional and has no effect if not used. What this allows is for distributions that want to distinguish among extension modules built for different versions of Python, but shared in the same file system path, to arrange for `.so` names that are unique and non-colliding. For example, let's say Python 3.2 was built with:: ./configure --with-so-abi-tag=cpython-32 and Python 3.3 was built with:: ./configure --with-so-abi-tag=cpython-33 For an arbitrary package `foo`, you would see these files when the distribution package was installed:: /usr/share/pyshared/foo.cpython-32.so /usr/share/pyshared/foo.cpython-33.so Proven approach =============== The approach described here is already proven, in a sense, on Debian and Ubuntu system where different extensions are used for debug builds of Python and extension modules. Debug builds on Windows also already use a different file extension for dynamic libraries. PEP 384 ======= PEP 384 defines a stable ABI for extension modules. Universal adoption of PEP 384 would eliminate the need for this PEP because all extension modules would be compatible with any Python version. In practice of course, it will be impossible to achieve universal adoption. Older extensions may not be ported to PEP 384, or an extension may require Python APIs outside of PEP 384 definition. Therefore there will always be a (hopefully diminishing, but never zero) need for ABI version tagged shared libraries. Further, it is anticipated that the stable ABI will evolve over time, meaning that existing PEP 384 compatible extension modules may be incompatible with future versions of Python. While a complete specification is reserved for PEP 384, here is a discussion of the relevant issues. PEP 384 describes a change to ``PyModule_Create()`` where ``3`` is passed as the API version if the extension was complied with ``Py_LIMITED_API``. This should be formalized into an official macro called ``PYTHON_ABI_VERSION`` to mirror ``PYTHON_API_VERSION``. If and when the ABI changes in an incompatible way, this version number would be bumped. To facilitate sharing, Python would be extended to search for extension modules with the ``PYTHON_ABI_VERSION`` number in its name. The prefix ``abi`` is reserved for Python's use. Thus for example, an initial implementation of PEP 384, compiled with `--with-so-abi-tag=cpython-xy` would search for the following file names when extension module `foo` is imported (in this order):: foo.abi3.so foo.cpython-xy.so foo.so The distutils [7]_ ``build_ext`` command would also have to be extended to compile to shared library files with the ``abi3`` tag, when the module author indicates that their extension supports that version of the ABI. This could be done in a backward compatible way by adding a keyword argument to the ``Extension`` class, such as:: Extension('foo', ['foo.c'], abi=3) Alternatives ============ In the initial python-dev thread [8]_ where this idea was first introduced, several alternatives were suggested. For completeness they are listed here, along with the reasons for not adopting them. Independent directories or symlinks ----------------------------------- Debian and Ubuntu could simply add a version-specific directory to ``sys.path`` that would contain just the extension modules for that version of Python. Or the symlink trick eliminated in PEP 3147 could be retained for just shared libraries. This approach is rejected because it propagates the essential complexity that PEP 3147 tries to avoid, and adds yet another directory to search for all modules, even when the number of extension modules is much fewer than the total number of Python packages. It also makes for more robust management when all of a package's module files live in the same directory, because it allows systems such as `dpkg` to detect file conflicts between distribution packages. Don't share packages with extension modules ------------------------------------------- It has been suggested that Python packages with extension modules not be shared among all supported Python versions on a distribution. Even with adoption of PEP 3149, extension modules will have to be compiled for every supported Python version, so perhaps sharing of such packages isn't useful anyway. Not sharing packages with extensions though is infeasible for several reasons. If a pure-Python package is shared in one version, should it suddenly be not-shared if the next release adds an extension module for speed? Also, even though all extension shared libraries will be compiled and distributed once for every supported Python, there's a big difference between duplicating the `.so` files and duplicating all `.py` files. The extra space increases the download time for such packages, and more immediately, increases the space pressures on already constrained distribution CD-ROMs. Reference implementation ======================== Work on this code is tracked in a Bazaar branch on Launchpad [5]_ until it's ready for merge into Python 3.2. The work-in-progress diff can also be viewed [6]_ and is updated automatically as new changes are uploaded. References ========== .. [1] PEP 3147 .. [2] PEP 384 .. [3] Ubuntu: <http://www.ubuntu.com> .. [4] Debian: <http://www.debian.org> .. [5] https://code.edge.launchpad.net/~barry/python/sovers .. [6] https://code.edge.launchpad.net/~barry/python/sovers/+merge/29411 .. [7] http://docs.python.org/py3k/distutils/index.html .. [8] http://mail.python.org/pipermail/python-dev/2010-June/100998.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
Hello, 2010/7/15 Barry Warsaw <barry@python.org>:
The first draft of PEP 3149 is ready for review.
I like it! I think it could mention the case where packages are not installed in the canonical directory, but placed elsewhere along the PYTHONPATH. This is how I deploy applications, for example, and the differences between python versions makes this a pain. The specific case of Windows should be mentioned: each foo.pyd contains the name of the python library (Python27.dll) it has been linked with; It must be rebuilt for a major version change. IMO the Windows installers provided by python.org should be built with a tag that contains this major number.
Thus for example, an initial implementation of PEP 384, compiled with `--with-so-abi-tag=cpython-xy` would search for the following file names when extension module `foo` is imported (in this order)::
foo.abi3.so foo.cpython-xy.so foo.so
Is this the correct order? IMO the so-abi-tag is more precise and the two first items should be swapped. PyPy would also benefit from this patch: it can now use extension modules, but the ABI is slightly different. By default, PyPy would load only modules containing the ABI tag, and refuse foo.so which is incompatible for sure. But the installations could still be shared between Python implementations. Cheers, -- Amaury Forgeot d'Arc
On Jul 16, 2010, at 12:16 AM, Amaury Forgeot d'Arc wrote:
2010/7/15 Barry Warsaw <barry@python.org>:
The first draft of PEP 3149 is ready for review.
I like it!
Cool!
I think it could mention the case where packages are not installed in the canonical directory, but placed elsewhere along the PYTHONPATH. This is how I deploy applications, for example, and the differences between python versions makes this a pain.
Because of the way the results of ./configure are embodied in the Makefile's $SO variable, and that variable is used unconditionally from sysconfig (and thus distutils), once you've configured --with-so-abi-tag, that tag will be used globally everywhere. I've added a note about this.
The specific case of Windows should be mentioned: each foo.pyd contains the name of the python library (Python27.dll) it has been linked with; It must be rebuilt for a major version change. IMO the Windows installers provided by python.org should be built with a tag that contains this major number.
The current version of the PEP and my implementation do not change the Windows builds at all. I don't feel qualified to integrate the ideas expressed in PEP 3149 for Windows builds, but I would be happy to accept patches to either the PEP or implementation to export the same tagging feature for Windows.
Thus for example, an initial implementation of PEP 384, compiled with `--with-so-abi-tag=cpython-xy` would search for the following file names when extension module `foo` is imported (in this order)::
foo.abi3.so foo.cpython-xy.so foo.so
Is this the correct order? IMO the so-abi-tag is more precise and the two first items should be swapped.
Good point, fixed.
PyPy would also benefit from this patch: it can now use extension modules, but the ABI is slightly different. By default, PyPy would load only modules containing the ABI tag, and refuse foo.so which is incompatible for sure. But the installations could still be shared between Python implementations.
Interesting. I've added a note about this to the PEP. Thanks for the feedback. -Barry
On 15.07.2010 01:59, Barry Warsaw wrote:
PEP 384 describes a change to ``PyModule_Create()`` where ``3`` is passed as the API version if the extension was complied with ``Py_LIMITED_API``. This should be formalized into an official macro called ``PYTHON_ABI_VERSION`` to mirror ``PYTHON_API_VERSION``. If and when the ABI changes in an incompatible way, this version number would be bumped. To facilitate sharing, Python would be extended to search for extension modules with the ``PYTHON_ABI_VERSION`` number in its name. The prefix ``abi`` is reserved for Python's use.
Thus for example, an initial implementation of PEP 384, compiled with `--with-so-abi-tag=cpython-xy` would search for the following file names when extension module `foo` is imported (in this order)::
foo.abi3.so foo.cpython-xy.so foo.so
The distutils [7]_ ``build_ext`` command would also have to be extended to compile to shared library files with the ``abi3`` tag, when the module author indicates that their extension supports that version of the ABI. This could be done in a backward compatible way by adding a keyword argument to the ``Extension`` class, such as::
Extension('foo', ['foo.c'], abi=3)
I like the proposal, but IMO it is too unspecific about the abi tag. Assume that an extension is built with such a configured python and then tried to run with an abi compatible configured python, but with a slightly different version tag, the extension won't be found. Differing file names per configuration should be avoided. Proposing 1) Remove the configure option and use the new naming using the tag for all configurations unconditionally. Everybody can expect the same name on every installation, and PEP 384 most likely will require using a tag too. As Amaury did point out, there is a use case in that PyPy can use this tag too to build extensions only usable by PyPy. 2) As PEP 3147 defines a non-configurable name for .pyc files, this PEP should define a non-configurable way for the tag. The tag should include all information which currently makes an extension ABI incompatible: - the python implementation (cpython, PyPy, ...) - the python version (3,2, 3.3, ...) - unicode configure option (--with-wide-unicode, 16 or 32) - platform information (necessary?) If this list changes for coming python versions, then it can be extended. Barry pointed out on irc chat that people might want to build experimental ABI incompatible versions, which should have its own tag. If this is wanted, please provide a configure option which lets extend/append to the tag. 3) In case that 1) is not acceptable, the --with-so-abi-tag option should be implemented in such a way that it isn't required to take any arguments, and in this case to default to the fixed naming schema described in 2). Matthias
On Fri, Jul 16, 2010 at 5:40 AM, Matthias Klose <doko@ubuntu.com> wrote:
2) As PEP 3147 defines a non-configurable name for .pyc files, this PEP should define a non-configurable way for the tag. The tag should include all information which currently makes an extension ABI incompatible: - the python implementation (cpython, PyPy, ...) - the python version (3,2, 3.3, ...) - unicode configure option (--with-wide-unicode, 16 or 32) - platform information (necessary?)
I'm not sure it's that easy to enumerate all of the ways to end up with an incompatible ABI. There are quite a lot of configure options listed with "configure --help", and it's not always obvious if an option changes the ABI. On top of that, there are compilation flags that can change the ABI, as Kristján discovered in the following thread: http://mail.python.org/pipermail/python-dev/2010-June/100583.html On the flip side, a fully enumerated ABI signature could be used to identify (in)compatible binary eggs, which is basically impossible now. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
On 16.07.2010 15:43, Daniel Stutzbach wrote:
On Fri, Jul 16, 2010 at 5:40 AM, Matthias Klose<doko@ubuntu.com> wrote:
2) As PEP 3147 defines a non-configurable name for .pyc files, this PEP should define a non-configurable way for the tag. The tag should include all information which currently makes an extension ABI incompatible: - the python implementation (cpython, PyPy, ...) - the python version (3,2, 3.3, ...) - unicode configure option (--with-wide-unicode, 16 or 32) - platform information (necessary?)
I'm not sure it's that easy to enumerate all of the ways to end up with an incompatible ABI. There are quite a lot of configure options listed with "configure --help", and it's not always obvious if an option changes the ABI. On top of that, there are compilation flags that can change the ABI, as Kristján discovered in the following thread:
right, I forgot about the debug builds, because it's already the standard on windows to build foo_d.so extensions, and I adopted it for Debian/Ubuntu too.
On the flip side, a fully enumerated ABI signature could be used to identify (in)compatible binary eggs, which is basically impossible now.
indeed.
On Jul 16, 2010, at 12:40 PM, Matthias Klose wrote:
I like the proposal, but IMO it is too unspecific about the abi tag. Assume that an extension is built with such a configured python and then tried to run with an abi compatible configured python, but with a slightly different version tag, the extension won't be found. Differing file names per configuration should be avoided. Proposing
1) Remove the configure option and use the new naming using the tag for all configurations unconditionally. Everybody can expect the same name on every installation, and PEP 384 most likely will require using a tag too. As Amaury did point out, there is a use case in that PyPy can use this tag too to build extensions only usable by PyPy.
2) As PEP 3147 defines a non-configurable name for .pyc files, this PEP should define a non-configurable way for the tag. The tag should include all information which currently makes an extension ABI incompatible: - the python implementation (cpython, PyPy, ...) - the python version (3,2, 3.3, ...) - unicode configure option (--with-wide-unicode, 16 or 32) - platform information (necessary?) If this list changes for coming python versions, then it can be extended. Barry pointed out on irc chat that people might want to build experimental ABI incompatible versions, which should have its own tag. If this is wanted, please provide a configure option which lets extend/append to the tag.
Okay, I'm convinced we need to hard code this tag, and I think it's okay not to provide a configure switch to control this. I've pushed a new version of the diff and PEP, the latter attached here for your convenience. Thanks for all the feedback. -Barry PEP: 3149 Title: ABI version tagged .so files Version: $Revision: 81577 $ Last-Modified: $Date: 2010-05-27 19:54:25 -0400 (Thu, 27 May 2010) $ Author: Barry Warsaw <barry@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2010-07-09 Python-Version: 3.2 Post-History: 2010-07-14, 2010-07-22 Resolution: TBD Abstract ======== PEP 3147 [1]_ described an extension to Python's import machinery that improved the sharing of Python source code, by allowing more than one byte compilation file (.pyc) to be co-located with each source file. This PEP defines an adjunct feature which allows the co-location of extension module files (.so) in a similar manner. This optional, build-time feature will enable downstream distributions of Python to more easily provide more than one Python major version at a time. Background ========== PEP 3147 defined the file system layout for a pure-Python package, where multiple versions of Python are available on the system. For example, where the `alpha` package containing source modules `one.py` and `two.py` exist on a system with Python 3.2 and 3.3, the post-byte compilation file system layout would be:: alpha/ __pycache__/ __init__.cpython-32.pyc __init__.cpython-33.pyc one.cpython-32.pyc one.cpython-33.pyc two.cpython-32.pyc two.cpython-33.pyc __init__.py one.py two.py For packages with extension modules, a similar differentiation is needed for the module's .so files. Extension modules compiled for different Python major versions are incompatible with each other due to changes in the ABI. Different configuration/compilation options for the same Python version can result in different ABIs (e.g. --with-wide-unicode). While PEP 384 [2]_ defines a stable ABI, it will minimize, but not eliminate extension module incompatibilities between Python builds or major versions. Thus a mechanism for discriminating extension module file names is proposed. Rationale ========= Linux distributions such as Ubuntu [3]_ and Debian [4]_ provide more than one Python version at the same time to their users. For example, Ubuntu 9.10 Karmic Koala users can install Python 2.5, 2.6, and 3.1, with Python 2.6 being the default. In order to share as much as possible between the available Python versions, these distributions install third party (i.e. non-standard library) packages into `/usr/share/pyshared` and symlink to them from `/usr/lib/pythonX.Y/dist-packages`. The symlinks exist because in a pre-PEP 3147 world (i.e < Python 3.2), the `.pyc` files resulting from byte compilation by the various installed Pythons will name collide with each other. For Python versions >= 3.2, all pure-Python packages can be shared, because the `.pyc` files will no longer cause file system naming conflicts. Eliminating these symlinks makes for a simpler, more robust Python distribution. A similar situation arises with shared library extensions. Because extension modules are typically named `foo.so` for a `foo` extension module, these would also name collide if `foo` was provided for more than one Python version. In addition, because different configuration/compilation options for the same Python version can cause different ABIs to be presented to extension modules. On POSIX systems for example, the configure options ``--with-pydebug``, ``--with-pymalloc``, and ``--with-wide-unicode`` all change the ABI. This PEP proposes to encode build-time options in the file name of the ``.so`` extension module files. PyPy [5]_ can also benefit from this PEP, allowing it to avoid name collisions in extension modules built for its API, but with a different `.so` tag. Proposal ======== The configure/compilation options chosen at Python interpreter build-time will be encoded in the shared library file name for extension modules. This "tag" will appear between the module base name and the operation file system extension for shared libraries. The following information *MUST* be included in the shared library file name: * The Python implementation (e.g. cpython, pypy, jython, etc.) * The interpreter's major and minor version numbers These two fields are separated by a hyphen and no dots are to appear between the major and minor version numbers. E.g. ``cpython-32``. Python implementations *MAY* include additional flags in the file name tag as appropriate. For example, on POSIX systems these flags will also contribute to the file name: * ``--with-pydebug`` (flag: ``d``) * ``--with-pymalloc`` (flag: ``m``) * ``--with-wide-unicode`` (flag: ``u``) By default in Python 3.2, ``configure`` enables ``--with-pymalloc`` so shared library file names would appear as ``foo.cpython-32m.so``. When the other two flags are also enabled, the file names would be ``foo.cpython-32dmu.so``. (This PEP only addresses build issues on POSIX systems that use the ``configure`` script. While Windows or other platform support is not explicitly disallowed under this PEP, platform expertise is needed in order to evaluate, describe, and implement support on such platforms.) The shared library file name tag is used unconditionally; it cannot be changed. The tag and extension module suffix are available through the ``sysconfig`` modules via the following variables:: >>> sysconfig.get_config_var('SO') '.cpython-32mu.so' >>> sysconfig.get_config_var('SOABI') 'cpython-32mu' Note that ``$SOABI`` contains just the tag, while ``$SO`` includes the platform extension for shared library files, and is the exact suffix added to the extension module name. For an arbitrary package `foo`, you would see these files when the distribution package was installed:: /usr/share/pyshared/foo.cpython-32m.so /usr/share/pyshared/foo.cpython-33m.so Python's dynamic module loader will recognize and import shared library extension modules with a tag that matches its build-time options. For backward compatibility, Python will also continue to import untagged extension modules, e.g. ``foo.so``. This shared library tag would be used globally for all distutils-based extension modules, regardless of where on the file system they are built. Extension modules built by means other than distutils would either have to calculate the tag manually, or fallback to the non-tagged `.so` file name. Proven approach =============== The approach described here is already proven, in a sense, on Debian and Ubuntu system where different extensions are used for debug builds of Python and extension modules. Debug builds on Windows also already use a different file extension for dynamic libraries, and in fact encoded (in a different way than proposed in this PEP) the Python major and minor version in the `.dll` file name. PEP 384 ======= PEP 384 defines a stable ABI for extension modules. In theory, universal adoption of PEP 384 would eliminate the need for this PEP because all extension modules could be compatible with any Python version. In practice of course, it will be impossible to achieve universal adoption, and as described above, different built-time flags still affect the ABI. Thus even with a stable ABI, this PEP may still be necessary. While a complete specification is reserved for PEP 384, here is a discussion of the relevant issues. PEP 384 describes a change to ``PyModule_Create()`` where ``3`` is passed as the API version if the extension was complied with ``Py_LIMITED_API``. This should be formalized into an official macro called ``PYTHON_ABI_VERSION`` to mirror ``PYTHON_API_VERSION``. If and when the ABI changes in an incompatible way, this version number would be bumped. To facilitate sharing, Python would be extended to search for extension modules with the ``PYTHON_ABI_VERSION`` number in its name. The prefix ``abi`` is reserved for Python's use. Thus, an initial implementation of PEP 384, when Python is configured with the default set of flags, would search for the following file names when extension module `foo` is imported (in this order):: foo.cpython-XYm.so foo.abi3.so foo.so The distutils [6]_ ``build_ext`` command would also have to be extended to compile to shared library files with the ``abi3`` tag, when the module author indicates that their extension supports that version of the ABI. This could be done in a backward compatible way by adding a keyword argument to the ``Extension`` class, such as:: Extension('foo', ['foo.c'], abi=3) Alternatives ============ In the initial python-dev thread [7]_ where this idea was first introduced, several alternatives were suggested. For completeness they are listed here, along with the reasons for not adopting them. Independent directories or symlinks ----------------------------------- Debian and Ubuntu could simply add a version-specific directory to ``sys.path`` that would contain just the extension modules for that version of Python. Or the symlink trick eliminated in PEP 3147 could be retained for just shared libraries. This approach is rejected because it propagates the essential complexity that PEP 3147 tries to avoid, and adds potentially several additional directories to search for all modules, even when the number of extension modules is much fewer than the total number of Python packages. For example, builds were made available both with and without wide unicode, with and without pydebug, and with and without pymalloc, the total number of directories search increases substantially. Don't share packages with extension modules ------------------------------------------- It has been suggested that Python packages with extension modules not be shared among all supported Python versions on a distribution. Even with adoption of PEP 3149, extension modules will have to be compiled for every supported Python version, so perhaps sharing of such packages isn't useful anyway. Not sharing packages with extensions though is infeasible for several reasons. If a pure-Python package is shared in one version, should it suddenly be not-shared if the next release adds an extension module for speed? Also, even though all extension shared libraries will be compiled and distributed once for every supported Python, there's a big difference between duplicating the `.so` files and duplicating all `.py` files. The extra size increases the download time for such packages, and more immediately, increases the space pressures on already constrained distribution CD-ROMs. Reference implementation ======================== Work on this code is tracked in a Bazaar branch on Launchpad [8]_ until it's ready for merge into Python 3.2. The work-in-progress diff can also be viewed [9]_ and is updated automatically as new changes are uploaded. References ========== .. [1] PEP 3147 .. [2] PEP 384 .. [3] Ubuntu: <http://www.ubuntu.com> .. [4] Debian: <http://www.debian.org> .. [5] http://codespeak.net/pypy/dist/pypy/doc/ .. [6] http://docs.python.org/py3k/distutils/index.html .. [7] http://mail.python.org/pipermail/python-dev/2010-June/100998.html .. [8] https://code.edge.launchpad.net/~barry/python/sovers .. [9] https://code.edge.launchpad.net/~barry/python/sovers/+merge/29411 Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
On 22 Jul, 2010, at 15:40, Barry Warsaw wrote:
Abstract ========
PEP 3147 [1]_ described an extension to Python's import machinery that improved the sharing of Python source code, by allowing more than one byte compilation file (.pyc) to be co-located with each source file.
This PEP defines an adjunct feature which allows the co-location of extension module files (.so) in a similar manner. This optional, build-time feature will enable downstream distributions of Python to more easily provide more than one Python major version at a time.
I guess this is not an explicit goal of this PEP, but the structure is very close to supporting multiple system architectures at the same time. I regularly develop code that needs to run on Windows, Linux and OSX and it is very convenient to do so in a shared directory tree (locally on one machine and accessed using remote mounts on the other ones). This works fine for pure python code, but I currently have to resort to tricks for extension modules.
Proposal ========
The configure/compilation options chosen at Python interpreter build-time will be encoded in the shared library file name for extension modules. This "tag" will appear between the module base name and the operation file system extension for shared libraries.
The following information *MUST* be included in the shared library file name:
* The Python implementation (e.g. cpython, pypy, jython, etc.) * The interpreter's major and minor version numbers
These two fields are separated by a hyphen and no dots are to appear between the major and minor version numbers. E.g. ``cpython-32``.
Python implementations *MAY* include additional flags in the file name tag as appropriate. For example, on POSIX systems these flags will also contribute to the file name:
* ``--with-pydebug`` (flag: ``d``) * ``--with-pymalloc`` (flag: ``m``) * ``--with-wide-unicode`` (flag: ``u``)
By default in Python 3.2, ``configure`` enables ``--with-pymalloc`` so shared library file names would appear as ``foo.cpython-32m.so``. When the other two flags are also enabled, the file names would be ``foo.cpython-32dmu.so``.
A way to generically solve my problem is to add the platform name as well, such as "foo.cpython-32m-darwin.so" or "foo.cpython-32mu-linux2.so". Ronald
On Jul 22, 2010, at 03:58 PM, Ronald Oussoren wrote:
I guess this is not an explicit goal of this PEP, but the structure is very close to supporting multiple system architectures at the same time. I regularly develop code that needs to run on Windows, Linux and OSX and it is very convenient to do so in a shared directory tree (locally on one machine and accessed using remote mounts on the other ones). This works fine for pure python code, but I currently have to resort to tricks for extension modules.
[...]
A way to generically solve my problem is to add the platform name as well, such as "foo.cpython-32m-darwin.so" or "foo.cpython-32mu-linux2.so".
This could certainly be done in the Windows build, but that wouldn't help bridge the gap among different POSIX systems. I'd be open to adding the platform name to the tag, but I'd probably define it as part of the implementation field, e.g. foo.cpython-linux2-32m.so. Or maybe start with the platform name, e.g. foo.linux2-cpython-32m. This isn't a strong preference though. Thoughts? -Barry
On 23 Jul, 2010, at 11:02, Barry Warsaw wrote:
On Jul 22, 2010, at 03:58 PM, Ronald Oussoren wrote:
I guess this is not an explicit goal of this PEP, but the structure is very close to supporting multiple system architectures at the same time. I regularly develop code that needs to run on Windows, Linux and OSX and it is very convenient to do so in a shared directory tree (locally on one machine and accessed using remote mounts on the other ones). This works fine for pure python code, but I currently have to resort to tricks for extension modules.
[...]
A way to generically solve my problem is to add the platform name as well, such as "foo.cpython-32m-darwin.so" or "foo.cpython-32mu-linux2.so".
This could certainly be done in the Windows build, but that wouldn't help bridge the gap among different POSIX systems.
The windows port isn't a problem for this, it uses a different suffix (".pyd") than the unix ports.
I'd be open to adding the platform name to the tag, but I'd probably define it as part of the implementation field, e.g. foo.cpython-linux2-32m.so. Or maybe start with the platform name, e.g. foo.linux2-cpython-32m. This isn't a strong preference though.
I don't have a strong opionion, but placing the platform name at the start is probably better to be consistent with sysconfig.get_platform(). Ronald
On Jul 23, 2010, at 11:48 AM, Ronald Oussoren wrote:
I'd be open to adding the platform name to the tag, but I'd probably define it as part of the implementation field, e.g. foo.cpython-linux2-32m.so. Or maybe start with the platform name, e.g. foo.linux2-cpython-32m. This isn't a strong preference though.
I don't have a strong opionion, but placing the platform name at the start is probably better to be consistent with sysconfig.get_platform().
What about the architecture (i386, amd64)? With every increase in length I start to get more concerned. We could encode the platform and architecture, but that gets into cryptic territory. OTOH, would you really co-install i386 and amd64 shared libraries on the same machine? (hello NFS ;). -Barry
On 23 Jul, 2010, at 11:54, Barry Warsaw wrote:
On Jul 23, 2010, at 11:48 AM, Ronald Oussoren wrote:
I'd be open to adding the platform name to the tag, but I'd probably define it as part of the implementation field, e.g. foo.cpython-linux2-32m.so. Or maybe start with the platform name, e.g. foo.linux2-cpython-32m. This isn't a strong preference though.
I don't have a strong opionion, but placing the platform name at the start is probably better to be consistent with sysconfig.get_platform().
What about the architecture (i386, amd64)? With every increase in length I start to get more concerned. We could encode the platform and architecture, but that gets into cryptic territory. OTOH, would you really co-install i386 and amd64 shared libraries on the same machine? (hello NFS ;).
I don't need this, but then again I primarily use a platform where the vendor has a proper solution for having binaries for multiple architectures ;-) Ronald
Ronald Oussoren <ronaldoussoren@mac.com> writes:
On 23 Jul, 2010, at 11:54, Barry Warsaw wrote:
What about the architecture (i386, amd64)? With every increase in length I start to get more concerned. We could encode the platform and architecture, but that gets into cryptic territory. OTOH, would you really co-install i386 and amd64 shared libraries on the same machine? (hello NFS ;).
I don't need this, but then again I primarily use a platform where the vendor has a proper solution for having binaries for multiple architectures ;-)
Well, Apple doesn't prevent people from building 32/64 bit-only python installations. Doesn't that give you 3 choices i386, amd64, fat?? And you can have framework or non-framework builds. Doesn't anybody else think this is lost work for very little gain? My /usr/lib/python2.6/site-packages directory consumes 200MB on disk. I couldn't care less if my /usr/lib/python2.5/site-packages consumed the same amount of disk space... - Ralf
On Jul 23, 2010, at 01:46 PM, schmir@gmail.com wrote:
Doesn't anybody else think this is lost work for very little gain? My /usr/lib/python2.6/site-packages directory consumes 200MB on disk. I couldn't care less if my /usr/lib/python2.5/site-packages consumed the same amount of disk space...
Right, you probably don't care now that your extension modules live in foo.so so it probably won't make much difference if they were named foo-blahblah.so, as long as they import. :) If you use Debian or Ubuntu though, it'll be a win for you by allow us to make Python support much more robust. -Barry
Barry Warsaw <barry@python.org> writes:
On Jul 23, 2010, at 01:46 PM, schmir@gmail.com wrote:
Doesn't anybody else think this is lost work for very little gain? My /usr/lib/python2.6/site-packages directory consumes 200MB on disk. I couldn't care less if my /usr/lib/python2.5/site-packages consumed the same amount of disk space...
Right, you probably don't care now that your extension modules live in foo.so so it probably won't make much difference if they were named foo-blahblah.so, as long as they import. :)
Most of the time it won't make much difference, right. But I can assure you, that it will bite some people and there is some code to be adapted.
If you use Debian or Ubuntu though, it'll be a win for you by allow us to make Python support much more robust.
I'd much prefer to have cleanly separated environments by having separate directories for my python modules. Sharing the source code and complicating things will not lead to increased robustness. - Ralf
On Jul 24, 2010, at 11:59 PM, schmir@gmail.com wrote:
Barry Warsaw <barry@python.org> writes:
On Jul 23, 2010, at 01:46 PM, schmir@gmail.com wrote:
Doesn't anybody else think this is lost work for very little gain? My /usr/lib/python2.6/site-packages directory consumes 200MB on disk. I couldn't care less if my /usr/lib/python2.5/site-packages consumed the same amount of disk space...
Right, you probably don't care now that your extension modules live in foo.so so it probably won't make much difference if they were named foo-blahblah.so, as long as they import. :)
Most of the time it won't make much difference, right. But I can assure you, that it will bite some people and there is some code to be adapted.
Do you have concrete examples? Without that it's just speculation I can't do much to address. Is the problem big or small? Easy to work around or not? "Change is bad" isn't a constructive argument. ;)
If you use Debian or Ubuntu though, it'll be a win for you by allow us to make Python support much more robust.
I'd much prefer to have cleanly separated environments by having separate directories for my python modules. Sharing the source code and complicating things will not lead to increased robustness.
That's fine, but it's not the way Debian/Ubuntu works today. PEP 3149 adoption will definitely remove significant complication for deploying multiple Python versions at the same time on those systems. -Barry
Barry Warsaw <barry@python.org> writes:
Do you have concrete examples? Without that it's just speculation I can't do much to address. Is the problem big or small? Easy to work around or not?
Some of the things that need to be adapted are e.g. Makefiles (basically anything that assumes modules have a certain name), all of the freezers (cxFreeze, py2exe, ...). The biggest problem probably will be that an import will load the wrong module or no module at all. I'm just speculating here...
"Change is bad" isn't a constructive argument. ;)
Did I make that argument?
That's fine, but it's not the way Debian/Ubuntu works today. PEP 3149 adoption will definitely remove significant complication for deploying multiple Python versions at the same time on those systems.
You're just moving that complication into python. - Ralf
On 26.07.2010 22:53, Ralf Schmitt wrote:
Barry Warsaw<barry@python.org> writes:
That's fine, but it's not the way Debian/Ubuntu works today. PEP 3149 adoption will definitely remove significant complication for deploying multiple Python versions at the same time on those systems.
You're just moving that complication into python.
There is nothing which prevents you to still deploy/use python modules in separate directories, and if you see a python package as a directory, nothing will change for you with this PEP besides the naming of the extensions.
I'd much prefer to have cleanly separated environments by having separate directories for my python modules.
That is your preference, but not what standards like the FHS talk about (i.e. having different locations for data, docs, headers).
Sharing the source code and complicating things will not lead to increased robustness.
Not true. Package managers like dpkg/apt-get, rpm/yum and maybe others do this for ages. And yes, the added "complexity" of package managers does lead to increased robustness. Matthias
Matthias Klose <doko@ubuntu.com> writes:
Not true. Package managers like dpkg/apt-get, rpm/yum and maybe others do this for ages. And yes, the added "complexity" of package managers does lead to increased robustness.
but how does sharing things lead to increased robustness (even if it might be managed by your package manager)?
On Jul 27, 2010, at 01:54 PM, Ralf Schmitt wrote:
Matthias Klose <doko@ubuntu.com> writes:
Not true. Package managers like dpkg/apt-get, rpm/yum and maybe others do this for ages. And yes, the added "complexity" of package managers does lead to increased robustness.
but how does sharing things lead to increased robustness (even if it might be managed by your package manager)?
It removes the need to maintain multiple directory trees and the symlinks between them. The tools that manage all this platform complexity get simpler, and thus easier to maintain, leading to increased robustness on the platform. Cheers, -Barry
On Jul 26, 2010, at 10:53 PM, Ralf Schmitt wrote:
Some of the things that need to be adapted are e.g. Makefiles (basically anything that assumes modules have a certain name), all of the freezers (cxFreeze, py2exe, ...). The biggest problem probably will be that an import will load the wrong module or no module at all. I'm just speculating here...
I took a look at cx_freeze - it doesn't support Python 3.2 yet afaict (the build fails but it may be shallow). I'm going to look at py2exe as soon as I can get a Windows VM up and running. Since import is (usually) handled by the built-in dynload_shlib.c code it should generally Just Work I think, unless some application installs custom import hooks. In any case, I think this will be a fairly standard and probably simple, porting effort, which goes along with supporting any new major release.
"Change is bad" isn't a constructive argument. ;)
Did I make that argument?
Apologies.
That's fine, but it's not the way Debian/Ubuntu works today. PEP 3149 adoption will definitely remove significant complication for deploying multiple Python versions at the same time on those systems.
You're just moving that complication into python.
It's a much different level and scope of complexity though. For Python, it's pretty simple: look for an additional .so file name pattern which gets baked into Python at compile time. For that, you're able to remove a huge amount of complexity on Debian/Ubuntu by removing the need to manage multiple directory trees and symlinks between files in those trees. The tools that manage them (i.e. handle package installs and removals) also get much simpler. I think it's a worthwhile trade-off. Cheers, -Barry
On Jul 23, 2010, at 12:54 PM, Barry Warsaw wrote:
On Jul 23, 2010, at 11:48 AM, Ronald Oussoren wrote:
I'd be open to adding the platform name to the tag, but I'd probably define it as part of the implementation field, e.g. foo.cpython-linux2-32m.so. Or maybe start with the platform name, e.g. foo.linux2-cpython-32m. This isn't a strong preference though.
I don't have a strong opionion, but placing the platform name at the start is probably better to be consistent with sysconfig.get_platform().
What about the architecture (i386, amd64)? With every increase in length I start to get more concerned. We could encode the platform and architecture, but that gets into cryptic territory. OTOH, would you really co-install i386 and amd64 shared libraries on the same machine? (hello NFS ;).
Thinking about this some more, I'd rather *not* include the platform or architecture in the tag by default. They aren't really necessary to support the instigating use case and will probably be fairly uncommon. I'd be okay including a configure option to allow you to add whatever you want after the implementation, version, and flags. E.g. something like: ./configure --with-abi-tag-extension=linux2 would lead to foo.cpython-32m-linux2.so, so not the nicer names we'd prefer but probably good enough for your purposes. Would that work for you? -Barry
On 23 Jul, 2010, at 23:19, Barry Warsaw wrote:
On Jul 23, 2010, at 12:54 PM, Barry Warsaw wrote:
On Jul 23, 2010, at 11:48 AM, Ronald Oussoren wrote:
I'd be open to adding the platform name to the tag, but I'd probably define it as part of the implementation field, e.g. foo.cpython-linux2-32m.so. Or maybe start with the platform name, e.g. foo.linux2-cpython-32m. This isn't a strong preference though.
I don't have a strong opionion, but placing the platform name at the start is probably better to be consistent with sysconfig.get_platform().
What about the architecture (i386, amd64)? With every increase in length I start to get more concerned. We could encode the platform and architecture, but that gets into cryptic territory. OTOH, would you really co-install i386 and amd64 shared libraries on the same machine? (hello NFS ;).
Thinking about this some more, I'd rather *not* include the platform or architecture in the tag by default. They aren't really necessary to support the instigating use case and will probably be fairly uncommon.
I'd be okay including a configure option to allow you to add whatever you want after the implementation, version, and flags. E.g. something like:
./configure --with-abi-tag-extension=linux2
would lead to foo.cpython-32m-linux2.so, so not the nicer names we'd prefer but probably good enough for your purposes.
Would that work for you?
That would certainly work. That said, I'm also fine with not adding the platform information or configure argument at all. My usecase is fairly exotic and I do have a feasible workaround. Ronald
On Jul 24, 2010, at 09:54 AM, Ronald Oussoren wrote:
I'd be okay including a configure option to allow you to add whatever you want after the implementation, version, and flags. E.g. something like:
./configure --with-abi-tag-extension=linux2
would lead to foo.cpython-32m-linux2.so, so not the nicer names we'd prefer but probably good enough for your purposes.
Would that work for you?
That would certainly work. That said, I'm also fine with not adding the platform information or configure argument at all. My usecase is fairly exotic and I do have a feasible workaround.
Cool. In that case, I won't add it. -Barry
On Fri, Jul 23, 2010 at 12:40 AM, Barry Warsaw <barry@python.org> wrote:
Python implementations *MAY* include additional flags in the file name tag as appropriate. For example, on POSIX systems these flags will also contribute to the file name:
* ``--with-pydebug`` (flag: ``d``) * ``--with-pymalloc`` (flag: ``m``) * ``--with-wide-unicode`` (flag: ``u``)
By default in Python 3.2, ``configure`` enables ``--with-pymalloc`` so shared library file names would appear as ``foo.cpython-32m.so``. When the other two flags are also enabled, the file names would be ``foo.cpython-32dmu.so``.
(This PEP only addresses build issues on POSIX systems that use the ``configure`` script. While Windows or other platform support is not explicitly disallowed under this PEP, platform expertise is needed in order to evaluate, describe, and implement support on such platforms.)
This leads me to a question: how do these configure options affect the PEP 384 stable ABI? That PEP is currently silent on the issue, while PEP 3149 appears to implicitly assume that "abi3" completely specifies the ABI. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 23, 2010, at 08:56 PM, Nick Coghlan wrote:
On Fri, Jul 23, 2010 at 12:40 AM, Barry Warsaw <barry@python.org> wrote:
Python implementations *MAY* include additional flags in the file name tag as appropriate. For example, on POSIX systems these flags will also contribute to the file name:
* ``--with-pydebug`` (flag: ``d``) * ``--with-pymalloc`` (flag: ``m``) * ``--with-wide-unicode`` (flag: ``u``)
By default in Python 3.2, ``configure`` enables ``--with-pymalloc`` so shared library file names would appear as ``foo.cpython-32m.so``. When the other two flags are also enabled, the file names would be ``foo.cpython-32dmu.so``.
(This PEP only addresses build issues on POSIX systems that use the ``configure`` script. While Windows or other platform support is not explicitly disallowed under this PEP, platform expertise is needed in order to evaluate, describe, and implement support on such platforms.)
This leads me to a question: how do these configure options affect the PEP 384 stable ABI? That PEP is currently silent on the issue, while PEP 3149 appears to implicitly assume that "abi3" completely specifies the ABI.
It's a great question - perhaps Martin can chime in? It may be that 'abiX' isn't enough to fully specify compatible extension modules even when that module is written entirely and solely against PEP 384. In that case, we may need to include the configure flags in the tag, e.g. foo.abi3-dmu.so. -Barry
This leads me to a question: how do these configure options affect the PEP 384 stable ABI? That PEP is currently silent on the issue, while PEP 3149 appears to implicitly assume that "abi3" completely specifies the ABI.
It's a great question - perhaps Martin can chime in? It may be that 'abiX' isn't enough to fully specify compatible extension modules even when that module is written entirely and solely against PEP 384. In that case, we may need to include the configure flags in the tag, e.g. foo.abi3-dmu.so.
The intention is that there is indeed just one stable ABI, so one configuration is the supported one, and that should be the "default" build. As for the specific settings, my analysis would be this: - pydebug: not supported by the stable ABI, as it changes the layout of PyObject, which is an exposed structure More specifically: Py_DEBUG, Py_TRACEREFS and Py_REF_DEBUG are all incompatible with the stable ABI - pymalloc: I fail to see the impact on the ABI. All allocator macros become function calls under Py_LIMITED_API, otherwise, there shouldn't be any need to have different versions of that. - wide-unicode: this is a tricky one. I'm tempted to say that the stable ABI should always use a Py_UNICODE that matches the platform's wchar_t. Alternative proposals are welcome. Regards, Martin
On Aug 28, 2010, at 12:29 PM, Martin v. Löwis wrote:
The intention is that there is indeed just one stable ABI, so one configuration is the supported one, and that should be the "default" build.
As for the specific settings, my analysis would be this: - pydebug: not supported by the stable ABI, as it changes the layout of PyObject, which is an exposed structure More specifically: Py_DEBUG, Py_TRACEREFS and Py_REF_DEBUG are all incompatible with the stable ABI - pymalloc: I fail to see the impact on the ABI. All allocator macros become function calls under Py_LIMITED_API, otherwise, there shouldn't be any need to have different versions of that. - wide-unicode: this is a tricky one. I'm tempted to say that the stable ABI should always use a Py_UNICODE that matches the platform's wchar_t. Alternative proposals are welcome.
Thanks Martin. I have updated PEP 3149 with these thoughts, but I'll leave it up to you to update PEP 384. I haven't heard a peep since my last RFC on PEP 3149. Guido, would you care to pronounce on the PEP, or designate someone who can do so (remembering that Martin is off-line for a while)? If acceptable, I'd like to get this into the tree before 3.2 alpha 2, currently scheduled for September 5. Cheers, -Barry
On 28 Aug, 2010, at 12:29, Martin v. Löwis wrote:
- wide-unicode: this is a tricky one. I'm tempted to say that the stable ABI should always use a Py_UNICODE that matches the platform's wchar_t. Alternative proposals are welcome.
Sizeof(wchar_t) is 4 on OSX, but the Apple frameworks use a 16-bit type to represent unicode codepoints (UniChar). Current builds on OSX use a 16-bit unicode type which makes it pretty cheap to convert strings from Python to a C array of UniChar. I'm therefore -1 on switching to a wide unicode build on OSX. Ronald
Ronald Oussoren wrote:
On 28 Aug, 2010, at 12:29, Martin v. Löwis wrote:
- wide-unicode: this is a tricky one. I'm tempted to say that the stable ABI should always use a Py_UNICODE that matches the platform's wchar_t. Alternative proposals are welcome.
Sizeof(wchar_t) is 4 on OSX, but the Apple frameworks use a 16-bit type to represent unicode codepoints (UniChar). Current builds on OSX use a 16-bit unicode type which makes it pretty cheap to convert strings from Python to a C array of UniChar.
I'm therefore -1 on switching to a wide unicode build on OSX.
-1 on always using wchar_t as well. Python's default is UCS2 and the stable ABI should not change that. I also think that this information is not relevant for the stable ABI: Extensions that want to stick to the stable ABI should really not have to know whether Py_UNICODE maps to wchar_t or not. If they want to interface to a local whcar_t type they can use the conversion APIs we have for that in the Unicode API: PyUnicode_FromWideChar() and PyUnicode_AsWideChar(). BTW: Wasn't one of the main reasons for having versioned .so files the idea to be able to have UCS2 and UCS4 versions installed side-by-side ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 27 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ 2010-09-15: DZUG Tagung, Dresden, Germany 18 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Aug 31, 2010, at 11:22 AM, M.-A. Lemburg wrote:
BTW: Wasn't one of the main reasons for having versioned .so files the idea to be able to have UCS2 and UCS4 versions installed side-by-side ?
Yes. This isn't an issue for PEP 3149 because it adds a flag to the shared library file name for wide unicodes. It's an issue for PEP 384. -Barry
-1 on always using wchar_t as well. Python's default is UCS2 and the stable ABI should not change that.
It's not really Python's default. It is what configure.in does by default. Python's default, on Linux, is UCS-4.
I also think that this information is not relevant for the stable ABI: Extensions that want to stick to the stable ABI should really not have to know whether Py_UNICODE maps to wchar_t or not. If they want to interface to a local whcar_t type they can use the conversion APIs we have for that in the Unicode API: PyUnicode_FromWideChar() and PyUnicode_AsWideChar().
Ok. I'm fine with excluding Py_UNICODE from the stable ABI. However, I think in the long run, I guess more support for wchar_t will then be needed in the API, e.g. more convenient argument parsing. Regards, Martin
"Martin v. Löwis" wrote:
-1 on always using wchar_t as well. Python's default is UCS2 and the stable ABI should not change that.
It's not really Python's default. It is what configure.in does by default. Python's default, on Linux, is UCS-4.
No, the default is UCS2 on all platforms and in configure.in. configure.in only uses UCS4 if it finds a TCL installation that happens to use UCS4 - for some reason I don't know :-) However, most Linux distros and more recently also some BSDs have switched over to using UCS4 for their distribution versions of Python.
I also think that this information is not relevant for the stable ABI: Extensions that want to stick to the stable ABI should really not have to know whether Py_UNICODE maps to wchar_t or not. If they want to interface to a local whcar_t type they can use the conversion APIs we have for that in the Unicode API: PyUnicode_FromWideChar() and PyUnicode_AsWideChar().
Ok. I'm fine with excluding Py_UNICODE from the stable ABI. However, I think in the long run, I guess more support for wchar_t will then be needed in the API, e.g. more convenient argument parsing.
Sure, we could add that. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 07 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ 2010-09-15: DZUG Tagung, Dresden, Germany 7 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
Hi, 2010/9/7 M.-A. Lemburg <mal@egenix.com>:
Ok. I'm fine with excluding Py_UNICODE from the stable ABI. However, I think in the long run, I guess more support for wchar_t will then be needed in the API, e.g. more convenient argument parsing.
Sure, we could add that.
Just to be clear: does this mean that PyUnicode_FromUnicode() and PyUnicode_AsUnicode() won't belong to the stable ABI? PyUnicode_AsWideChar() is not as fast, because it needs to copy the data. -- Amaury Forgeot d'Arc
Amaury Forgeot d'Arc wrote:
Hi,
2010/9/7 M.-A. Lemburg <mal@egenix.com>:
Ok. I'm fine with excluding Py_UNICODE from the stable ABI. However, I think in the long run, I guess more support for wchar_t will then be needed in the API, e.g. more convenient argument parsing.
Sure, we could add that.
Just to be clear: does this mean that PyUnicode_FromUnicode() and PyUnicode_AsUnicode() won't belong to the stable ABI?
As I understood Martin's comment Py_UNICODE would not be part of the ABI in the sense that you can access the Py_UNICODE data from within the extension module. It should still be fine, passing around opaque Py_UNICODE buffers.
PyUnicode_AsWideChar() is not as fast, because it needs to copy the data.
True. Also see this patch which tries to address the issue: http://bugs.python.org/issue8654 With the terminology used there, the stable ABI would implicitly have Py_UNICODE_AGNOSTIC set - and then prevent exposing the structure of Py_UNICODE* buffers while still allowing to pass them around. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 07 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-08-19: Released mxODBC 3.1.0 http://python.egenix.com/ 2010-09-15: DZUG Tagung, Dresden, Germany 7 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
Am 07.09.2010 19:46, schrieb M.-A. Lemburg:
"Martin v. Löwis" wrote:
-1 on always using wchar_t as well. Python's default is UCS2 and the stable ABI should not change that.
It's not really Python's default. It is what configure.in does by default. Python's default, on Linux, is UCS-4.
No, the default is UCS2 on all platforms and in configure.in.
configure.in only uses UCS4 if it finds a TCL installation that happens to use UCS4 - for some reason I don't know :-)
However, most Linux distros and more recently also some BSDs have switched over to using UCS4 for their distribution versions of Python.
Hmm. So UCS4 *is* the default for Linux. The default on the system is not what Python's configure makes it, but what the system vendors make it - they are the ones making the system, after all. Regards, Martin
participants (22)
-
"Martin v. Löwis"
-
Amaury Forgeot d'Arc
-
Barry Warsaw
-
Ben Finney
-
Benjamin Peterson
-
Brett Cannon
-
C. Titus Brown
-
Daniel Stutzbach
-
Fred Drake
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
James Y Knight
-
John Arbash Meinel
-
M.-A. Lemburg
-
Matthias Klose
-
Nick Coghlan
-
Ralf Schmitt
-
Ronald Oussoren
-
schmir@gmail.com
-
Scott Dial
-
Éric Araujo