[Distutils] Extracting C extensions from zipfiles on sys.path (Was: wheels on sys.path clarification (reboot))
Paul Moore
p.f.moore at gmail.com
Thu Jan 30 14:23:52 CET 2014
Changing the subject to clearly focus the discussion.
On 30 January 2014 11:57, Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
> If you have other reasons for your -1, I'd like to hear them.
OK. Note that this is not, in my view, an issue with wheels, but
rather about zipfiles on sys.path, and (deliberate) design limitations
of the module loader and zipimport implementations.[1]
First of all, it is not possible to load a DLL into a process' memory
[2, 3] unless it is stored as a file in the filesystem. So any attempt
to import a C extension from a zipfile must, by necessity, involve
extracting that DLL to the filesystem. That's where I see the
problems. None are deal-breaking issues, but they consist of a number
of niggling issues that cumulatively chip away at the reliability of
the concept until the end result has enough corner cases and risks to
make it unacceptable (depending on your tolerance for risks - there's
a definite judgement call involved).
The issues I can see are: [4]
1. You need to choose a location to put the extracted file. On Windows
in particular, there is no guaranteed-available filesystem location
that can be used without risk. Some accounts have no home directory,
some (locked down) users have no permissions anywhere but very
specific places, even TEMP may not be usable if there's an aggressive
housekeeping routine in place - but TEMP is probably the best choice
of a bad lot.
2. There are race conditions to consider. If the extraction is not
completely isolated per-process, what if 2 processes want to use
different versions of the same DLL? How will these be distinguished?
[5] So to avoid corner cases you have to assume only the one process
uses a given extracted DLL.
3. Clean-up is an issue. How will the extracted files be removed? You
can't unload the DLLs from Python, and you can't delete open files in
Windows. So do you simply leave the files lying round? Or do you do
some sort of atexit dance to run a separate process after the Python
process terminates which will do the cleanup? What happens to that
process when virus checkers hold the file open? Leaving the files
around is probably the most robust answer, but it's not exactly
friendly.
As I've said elsewhere, these are fundamental issues with importing
DLLs from zipfiles, and have no direct relationship to wheels. The
only place where having a wheel rather than a general zipfile makes a
difference is that a wheel *might* at some point contain metadata that
allows the wheel to claim that it's "OK" to load its contents from a
zipfile. But my points above are not something that the author of the
C extension can address, so there's no way that I can see that an
extension author can justifiably set that flag.
So: as wheels don't give any additional reliability over any other
zipfile, I don't see this (loading C extensions) as a wheel-related
feature. Ideally, if these problems can be solved, the solution should
be included in the core zipimport module so that all users can
benefit. If there are still issues to iron out and experience to be
gained, a 3rd party "enhanced zip importer" module would be a
reasonable test-bed for the solution. A 3rd party solution could also
be appropriate if the caveats and/or limitations were generally
acceptable, but sufficient to prohibit stdlib inclusion. The wheel
mount API could, if you wanted, look for the existence of that
enhanced zipimport module and use it when appropriate, but baking the
feature into wheel mount just limits your user base (and hence your
audience for raising bug reports, etc) needlessly.
I hope this explains my reasoning in sufficient detail.
FINAL DISCLAIMER: I have no objection to this feature being provided
per se, any more than I object to the existence of (say) Zope. Just
because I'm not a member of the target audience doesn't mean that it's
not a feature that some might benefit from. All I'm trying to do here
is offer my input as someone who was involved in the initial
implementation of zipimport, and who has kept an interested eye on how
it has been used in the 11 years since its introduction - and in
particular how people have tried to overcome the limitations we felt
we had to impose when designing it. Ultimately, I would be overjoyed
if someone could find a solution to this issue (in much the same way
as I'm delighted by what Brett has done with importlib).
Paul
Footnotes:
[1] Historical footnote - I was directly involved with the design of
PEP 302 and the zipimport implementation, and we made a deliberate
choice to only look at pure Python files, because the platform issues
around C extensions were "too hard".
[2] I'm talking from a Windows perspective here. I do not have
sufficient low-level knowledge of Unix to comment on that case. I
suspect that the issues are similar but I defer to the platform
experts.
[3] There is, I believe, code "out there" on the internet to map a DLL
image into a process based purely in memory, but I think it's a fairly
gross hack. I have a suspicion that someone - possibly Thomas Heller -
experimented with it at one time, but never came up with a viable
implementation. There's also TCL's tclkit technology, which *may*
support binary extensions, and may be worth a look, but TCL has
virtual filesystem support built in quite deep in the core, so how it
works may not be applicable to Python.
[4] I'm suggesting answers to the questions I'm raising here. The
answers *may* be wrong - I've never tried to design a robust solution
to this issue - but I believe the questions are the important point
here. Please don't focus on why my suggested approach is wrong - I
know it is!
[5] To be fair, this is where the wheel metadata might help in
distinguishing. But consider development and testing, where repeated
test runs would not typically have different versions, but the user
might well want to test whether running from zip still works. So wheel
metadata helps, but isn't a complete solution. And compile data is
probably just as good, so let's keep assuming we are looking at a
general zipimport facility.
More information about the Distutils-SIG
mailing list