Safely importing zip files with C extensions

This quote is here to stop GMane complaining that I'm top-posting. Ignore.
I've already posted this to distutils-sig, but thought that it might be of interest to readers here as it relates to importing C extensions ... zipimport is great, but there can be issues importing software that contains C extensions. But the new wheel format (PEP 427) may give us a better way of importing zip files containing C extensions. Since wheels are .zip files, they can sometimes be used to provide functionality without needing to be installed. But whereas .zip files contain no convention for indicating compatibility with a particular Python, wheels do contain this compatibility information. Thus, it is possible to check if a wheel can be directly imported from, and the wheel support in distlib allows you to take advantage of this using the mount() and unmount() methods. When you mount a wheel, its absolute path name is added to sys.path, allowing the Python code in it to be imported. (A DistlibException is raised if the wheel isn't compatible with the Python which calls the mount() method.) You don't need mount() just to add the wheel's name to sys.path, or to import pure-Python wheels, of course. But the mount() method goes further than just enabling Python imports - any C extensions in the wheel are also made available for import. For this to be possible, the wheel has to be built with additional metadata about extensions - a JSON file called EXTENSIONS which serialises an extension mapping dictionary. This maps extension module names to the names in the wheel of the shared libraries which implement those modules. Running unmount() on the wheel removes its absolute pathname from sys.path and makes its C extensions, if any, also unavailable for import. Wheels built with the new "distil" tool contain the EXTENSIONS metadata, so can be mounted complete with C extensions: $ distil download -d /tmp simplejson Downloading simplejson-3.1.2.tar.gz to /tmp/simplejson-3.1.2 63KB @ 73 KB/s 100 % Done: 00:00:00 Unpacking ... done. $ distil package --fo=wh -d /tmp /tmp/simplejson-3.1.2/ The following packages were built: /tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl $ python Python 2.7.2+ (default, Jul 20 2012, 22:15:08) [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Does anyone see any problems with this approach to importing C extensions from zip files? Regards, Vinay Sajip

2013/3/27 Vinay Sajip <vinay_sajip@yahoo.co.uk>
When you mount a wheel, its absolute path name is added to sys.path, allowing the Python code in it to be imported.
Better: just put the wheel path to sys.path sys.path.append('/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl') and let a sys.path_hook entry do the job. Such a WheelImporter could even inherit from zipimporter, plus the magic required for C extensions. It avoids the mount/nomount methods, only the usual sys.path operations are necessary from the user. -- Amaury Forgeot d'Arc

Amaury Forgeot d'Arc <amauryfa <at> gmail.com> writes:
That's what the mount() actually does - adds the wheel to a registry that an import hook uses. You also need a place to check that the wheel being mounted is compatible with the Python doing the mounting - I'm not sure whether what the import hook should do if e.g. there is a compatibility problem with the wheel (e.g. is it clear that it should always raise an ImportError? Or ignore the wheel - seems wrong? Or do something else?) Regards, Vinay Sajip

Jim Fulton is right that weird failures are a characteristic of zipped eggs, so one of the #1 requests for setuptools is how to prohibit zipping from ever happening. This is an important reason why wheel is billed as an installation format -- fewer users with pitchforks. It's very cool that it works though. Debugging is slightly easier than it was in the old days because pdb can now read the source code from the zip. An unzipped wheel as a directory with the same name as the wheel would be a more reliable solution that might be interesting to work with. It would work the same as egg unless you had important files in the .data/ (currently mostly used for console scripts and include files) directory. However it was always confusing to have more than one kind (zipped, unzipped) of egg. On Wed, Mar 27, 2013 at 4:41 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

Daniel Holth <dholth <at> gmail.com> writes:
Well, it's just an experiment, and I was soliciting comments because I'm not as familiar with the issues as some others are. Distlib is still only at version 0.1.1, and the mount()/unmount() functionality is not set in stone :-) Regards, Vinay Sajip

On Wed, Mar 27, 2013 at 1:13 PM, Amaury Forgeot d'Arc <amauryfa@gmail.com>wrote:
I implemented just such a path hook ---- zipimporter plus the magic required for C extensions --- as a challenge to myself to learn more about the Python import mechanisms. See https://github.com/bfroehle/pydzipimport. Cheers, Brad

On Wed, Mar 27, 2013 at 5:19 PM, Bradley M. Froehle <brad.froehle@gmail.com> wrote:
FYI, there appears to be a bug for Windows with packages: you're using '/__init__' in a couple places that should actually be os.sep+'__init__'. This does seem like a good way to address the issue, for those rare situations where this would be a good idea. The zipped .egg approach was originally intended for user-managed plugin directories for certain types of extensible platforms, where "download a file and stick it in the plugins directory" is a low-effort way to install plugins, without having to build a lot of specialized install capability. As Jim has pointed out, though, this doesn't generalize well to a full-blown packaging system. Technically, you can blame Bob Ippolito for this, since he's the one who talked me into using eggs to install Python libraries in general, not just as a plugin packaging mechanism. ;-) That being said, *unpacked* egg, er, wheels, are still a great way to meet all of the "different apps needing different versions" use cases (without needing one venv per app), and nowadays the existence of automated installer tools means that using one to install a plugin for a low-tech plugin system is not a big deal, as long as that tool supports the simple unpacked wheel scenario. So I wholeheartedly support some kind of mount/unmount or "require"-type mechanism for finding plugins. pkg_resources even has an API for handling simple dynamic plugin dependency resolution scenarios: http://peak.telecommunity.com/DevCenter/PkgResources#locating-plugins It'd be a good idea if distlib provides a similar feature, or at least the APIs upon which apps or frameworks can implement such features. (Historical note for those who weren't around back then: easy_install wasn't even an *idea* until well after eggs were created; the original idea was just that people would build plugins and libraries as eggs and manually drop them in directories, where a plugin support library would discover them and add them to sys.path as needed. And Bob and I also considered a sort of "update site" mechanism ala Eclipse, with a library to let apps fetch plugins. But as soon as eggs existed and PyPI allowed uploads, it was kind of an obvious follow-up to make an installation tool as a kind of "technology demonstration".... which promptly became a monster. The full story with all its twists and turns can also be found here: http://mail.python.org/pipermail/python-dev/2006-April/064145.html )

Vinay Sajip, 27.03.2013 20:38:
I've always hated this setuptools misfeature of copying C extensions from an installed archive into a user directory, one for each user. At least during normal installation, they should be properly unpacked into normal shared library files in the file system. Whether it then makes sense to special case one-shot trial imports like the above without installation is a bit of a different question, but I don't see a compelling reason for adding complexity here. It's not really an important use case. Stefan

Stefan Behnel <stefan_ml <at> behnel.de> writes:
The user directory location is not a key part of the functionality, it could just as well be a shared location across all users. And this is an option for specific scenarios, not a general substitute for installing the wheel (which unpacks everything into FHS-style locations). A lot of people use virtual envs, which are per-user anyway. I'm not suggesting this is a good idea for system-wide deployments of software.
Well, my post was to elicit some comment about the usefulness of the feature, so fair enough. It doesn't seem especially complex though, unless I've missed something. Regards, Vinay Sajip

Am 28.03.2013 17:09, schrieb Brett Cannon:
Which must be done carefully to prevent a security issue. It shouldn't be unzipped anywhere but into a directory only writable by the process.
Cleanup is going to be tricky or even impossible. Windows locks loaded DLLs and therefore prevents their removal. It's possible to unload DLLs but I don't know the implications.

On Thu, Mar 28, 2013 at 9:09 AM, Brett Cannon <brett@python.org> wrote:
Once http://sourceware.org/bugzilla/show_bug.cgi?id=11767 is implemented and available in libc, no extraction of .so's should be needed (they will likely need to be stored uncompressed in the .zip file for that though).

Am 29.03.2013 02:06, schrieb Gregory P. Smith:
For windows there is already code that does it: http://www.py2exe.org/index.cgi/Hacks/ZipExtImporter This page is not up-to-date, but it describes the idea and the implementation. The code currently is 32-bit only and for Python 2 but that probably can be fixed. It is based on Joachim Bauch's MemoryModule: https://github.com/fancycode/MemoryModule Thomas

2013/3/27 Vinay Sajip <vinay_sajip@yahoo.co.uk>
When you mount a wheel, its absolute path name is added to sys.path, allowing the Python code in it to be imported.
Better: just put the wheel path to sys.path sys.path.append('/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl') and let a sys.path_hook entry do the job. Such a WheelImporter could even inherit from zipimporter, plus the magic required for C extensions. It avoids the mount/nomount methods, only the usual sys.path operations are necessary from the user. -- Amaury Forgeot d'Arc

Amaury Forgeot d'Arc <amauryfa <at> gmail.com> writes:
That's what the mount() actually does - adds the wheel to a registry that an import hook uses. You also need a place to check that the wheel being mounted is compatible with the Python doing the mounting - I'm not sure whether what the import hook should do if e.g. there is a compatibility problem with the wheel (e.g. is it clear that it should always raise an ImportError? Or ignore the wheel - seems wrong? Or do something else?) Regards, Vinay Sajip

Jim Fulton is right that weird failures are a characteristic of zipped eggs, so one of the #1 requests for setuptools is how to prohibit zipping from ever happening. This is an important reason why wheel is billed as an installation format -- fewer users with pitchforks. It's very cool that it works though. Debugging is slightly easier than it was in the old days because pdb can now read the source code from the zip. An unzipped wheel as a directory with the same name as the wheel would be a more reliable solution that might be interesting to work with. It would work the same as egg unless you had important files in the .data/ (currently mostly used for console scripts and include files) directory. However it was always confusing to have more than one kind (zipped, unzipped) of egg. On Wed, Mar 27, 2013 at 4:41 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

Daniel Holth <dholth <at> gmail.com> writes:
Well, it's just an experiment, and I was soliciting comments because I'm not as familiar with the issues as some others are. Distlib is still only at version 0.1.1, and the mount()/unmount() functionality is not set in stone :-) Regards, Vinay Sajip

On Wed, Mar 27, 2013 at 1:13 PM, Amaury Forgeot d'Arc <amauryfa@gmail.com>wrote:
I implemented just such a path hook ---- zipimporter plus the magic required for C extensions --- as a challenge to myself to learn more about the Python import mechanisms. See https://github.com/bfroehle/pydzipimport. Cheers, Brad

On Wed, Mar 27, 2013 at 5:19 PM, Bradley M. Froehle <brad.froehle@gmail.com> wrote:
FYI, there appears to be a bug for Windows with packages: you're using '/__init__' in a couple places that should actually be os.sep+'__init__'. This does seem like a good way to address the issue, for those rare situations where this would be a good idea. The zipped .egg approach was originally intended for user-managed plugin directories for certain types of extensible platforms, where "download a file and stick it in the plugins directory" is a low-effort way to install plugins, without having to build a lot of specialized install capability. As Jim has pointed out, though, this doesn't generalize well to a full-blown packaging system. Technically, you can blame Bob Ippolito for this, since he's the one who talked me into using eggs to install Python libraries in general, not just as a plugin packaging mechanism. ;-) That being said, *unpacked* egg, er, wheels, are still a great way to meet all of the "different apps needing different versions" use cases (without needing one venv per app), and nowadays the existence of automated installer tools means that using one to install a plugin for a low-tech plugin system is not a big deal, as long as that tool supports the simple unpacked wheel scenario. So I wholeheartedly support some kind of mount/unmount or "require"-type mechanism for finding plugins. pkg_resources even has an API for handling simple dynamic plugin dependency resolution scenarios: http://peak.telecommunity.com/DevCenter/PkgResources#locating-plugins It'd be a good idea if distlib provides a similar feature, or at least the APIs upon which apps or frameworks can implement such features. (Historical note for those who weren't around back then: easy_install wasn't even an *idea* until well after eggs were created; the original idea was just that people would build plugins and libraries as eggs and manually drop them in directories, where a plugin support library would discover them and add them to sys.path as needed. And Bob and I also considered a sort of "update site" mechanism ala Eclipse, with a library to let apps fetch plugins. But as soon as eggs existed and PyPI allowed uploads, it was kind of an obvious follow-up to make an installation tool as a kind of "technology demonstration".... which promptly became a monster. The full story with all its twists and turns can also be found here: http://mail.python.org/pipermail/python-dev/2006-April/064145.html )

Vinay Sajip, 27.03.2013 20:38:
I've always hated this setuptools misfeature of copying C extensions from an installed archive into a user directory, one for each user. At least during normal installation, they should be properly unpacked into normal shared library files in the file system. Whether it then makes sense to special case one-shot trial imports like the above without installation is a bit of a different question, but I don't see a compelling reason for adding complexity here. It's not really an important use case. Stefan

Stefan Behnel <stefan_ml <at> behnel.de> writes:
The user directory location is not a key part of the functionality, it could just as well be a shared location across all users. And this is an option for specific scenarios, not a general substitute for installing the wheel (which unpacks everything into FHS-style locations). A lot of people use virtual envs, which are per-user anyway. I'm not suggesting this is a good idea for system-wide deployments of software.
Well, my post was to elicit some comment about the usefulness of the feature, so fair enough. It doesn't seem especially complex though, unless I've missed something. Regards, Vinay Sajip

Am 28.03.2013 17:09, schrieb Brett Cannon:
Which must be done carefully to prevent a security issue. It shouldn't be unzipped anywhere but into a directory only writable by the process.
Cleanup is going to be tricky or even impossible. Windows locks loaded DLLs and therefore prevents their removal. It's possible to unload DLLs but I don't know the implications.

On Thu, Mar 28, 2013 at 9:09 AM, Brett Cannon <brett@python.org> wrote:
Once http://sourceware.org/bugzilla/show_bug.cgi?id=11767 is implemented and available in libc, no extraction of .so's should be needed (they will likely need to be stored uncompressed in the .zip file for that though).

Am 29.03.2013 02:06, schrieb Gregory P. Smith:
For windows there is already code that does it: http://www.py2exe.org/index.cgi/Hacks/ZipExtImporter This page is not up-to-date, but it describes the idea and the implementation. The code currently is 32-bit only and for Python 2 but that probably can be fixed. It is based on Joachim Bauch's MemoryModule: https://github.com/fancycode/MemoryModule Thomas
participants (10)
-
Amaury Forgeot d'Arc
-
Bradley M. Froehle
-
Brett Cannon
-
Christian Heimes
-
Daniel Holth
-
Gregory P. Smith
-
PJ Eby
-
Stefan Behnel
-
Thomas Heller
-
Vinay Sajip