distlib.mount API design (was: wheels on sys.path clarification (reboot))

On 30 January 2014 21:57, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
My one technical issue is with going beyond zipimport behaviour to the point of extracting DLLs to the filesystem. I remain -1 on that feature, and I believe I have explained why I think there are issues (and why I think that any solution should be part of zipimport and not added on in library or user code). But I'm happy to go through the details again, if you like - or just to accept that I don't
Yes please, let's get into some details. Of course I understand that you might not want to use the feature, but I don't understand the -1 on the feature per se - whether it is in distlib or in zipimport is a secondary consideration. I agree that zipimport is the logical place for it, but ISTM the reason why it can't go in there just yet is also the reason why one might have some reservations about the feature: binary compatibility. I accept that this not yet a fully resolved issue in general (cf. the parallel discussion about numpy), but if we can isolate these issues, we can perhaps tackle them. But for me, that's the main reason why this part of the distlib API is experimental.
I actually think this is a useful thing to experiment with, I'm just not sure distlib is the best place for that experiment. With appropriately secure tempfile handling and the right sys.path (and module __path__) manipulation it's not obviously *impossible* to handle C extensions at arbitrary positions in the module namespace this way, just difficult. zipimport itself is a bad place to experiment though, since not only is it currently a complex ball of C code, but adding such a feature without clear evidence of robust support in a third party project would be irresponsible. In the case of distlib, the potential complexity of ensuring that such a scheme works consistently across multiple platforms and as part of various complex package layouts is enough to make me nervous about having it in the same library as the metadata 2.0 reference implementation :) Now, if you were to split that functionality out from distlib into a separate "wheeltab" project (or a name of your choice), I'd be substantially less nervous, because endorsing distlib as the metadata 2.0 reference implementation wouldn't carry any implications of endorsing a feature I consider "potentially interesting but rather challenging to implement in a robust manner". mount() would become something I could explore when I had some additional free time (hah!), rather than something I felt obliged to help get to a more robust state before releasing metadata 2.0. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 30 January 2014 12:29, Nick Coghlan <ncoghlan@gmail.com> wrote:
I actually think this is a useful thing to experiment with, I'm just not sure distlib is the best place for that experiment. With appropriately secure tempfile handling and the right sys.path (and module __path__) manipulation it's not obviously *impossible* to handle C extensions at arbitrary positions in the module namespace this way, just difficult. zipimport itself is a bad place to experiment though, since not only is it currently a complex ball of C code, but adding such a feature without clear evidence of robust support in a third party project would be irresponsible.
I just sent a long message that essentially gave a chunk of history and suggested a similar thing - an "enhanced zipimport" module to experiment with solutions in this space. With importlib available, an experimental implementation shouldn't even be too hard. The only difference is that I see very little reason why such a solution can't apply to all zipfiles, and not just wheels. Paul

On 30 Jan 2014 23:26, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 30 January 2014 12:29, Nick Coghlan <ncoghlan@gmail.com> wrote:
I actually think this is a useful thing to experiment with, I'm just not sure distlib is the best place for that experiment. With appropriately secure tempfile handling and the right sys.path (and module __path__) manipulation it's not obviously *impossible* to handle C extensions at arbitrary positions in the module namespace this way, just difficult. zipimport itself is a bad place to experiment though, since not only is it currently a complex ball of C code, but adding such a feature without clear evidence of robust support in a third party project would be irresponsible.
I just sent a long message that essentially gave a chunk of history and suggested a similar thing - an "enhanced zipimport" module to experiment with solutions in this space. With importlib available, an experimental implementation shouldn't even be too hard.
The only difference is that I see very little reason why such a solution can't apply to all zipfiles, and not just wheels.
The advantage of wheels over plain zipfiles for this use case is the structured metadata. distlib.mount doesn't try to guess the package structure for the extensions, you have to provide an EXTENSIONS file in the metadata that explains what C extensions are present and how they should map into the module namespace. Cheers, Nick.
Paul

On 30 January 2014 22:38, Nick Coghlan <ncoghlan@gmail.com> wrote:
The advantage of wheels over plain zipfiles for this use case is the structured metadata. distlib.mount doesn't try to guess the package structure for the extensions, you have to provide an EXTENSIONS file in the metadata that explains what C extensions are present and how they should map into the module namespace.
OK, I think I get the idea now. I'm still not comfortable with the temp directory clutter that unpacking leaves (in particular on Windows where deletion isn't even possible in an atexit routine) but I'll survive. I *would* like to see the various technical issues and implications in the API documentation, though. The implications and limitations, and in particular the manual cache management requirements, need to be made explicit. (I thought I'd seen docs somewhere, but they definitely aren't in the API reference for the distlib.wheel module). Paul

-------------------------------------------- On Thu, 30/1/14, Paul Moore <p.f.moore@gmail.com> wrote:
I'm still not comfortable with the temp directory clutter that unpacking leaves (in particular on Windows where deletion isn't even possible in an atexit routine) but I'll survive.
It's up to the using application to do cache management of this type. It's not as if the cache will grow unbounded in most realistic scenarios, and perfectly possible to do cleanup during startup. It may not be ideal, but seems acceptable given the limitations of the underlying platform.
I *would* like to see the various technical issues and implications in the API documentation, though. The implications and limitations, and in particular the manual cache management requirements, need to be made explicit. (I thought I'd seen docs somewhere, but they definitely aren't in the API reference for the distlib.wheel module).
There's some mention in the distlib.resources documentation (the cache would also be used for things like cacert.pem), but I'll certainly update the wheel documentation to cover this area, and update the relevant parts of the distlib.resources and distlib.util documentation, too. BTW, I raised the whole issue of extracting C extensions for import from zips on python-dev in March last year: https://mail.python.org/pipermail/python-dev/2013-March/124970.html The feedback I got indicated (to me) that while people felt there were some problem areas (e.g. cleanup was mentioned), there were no show-stoppers. No ringing endorsements, but no red flags were raised either. Regards, Vinay Sajip

Paul Moore <p.f.moore <at> gmail.com> writes:
I *would* like to see the various technical issues and implications in the API documentation, though. The implications and limitations, and in particular the manual cache management requirements, need to be made explicit.
I've updated the documentation, see http://distlib.readthedocs.org/en/latest/reference.html The docs on distlib.util.get_cache_base have the info on the cache and cleanup thereof, including handling of no-home-directory, security implications, open files in Windows etc. I've also updated the distlib.resources.Cache documentation to link to get_cache_base. I've added is_mountable and is_compatible methods to Wheel, and added/ updated the docs for them and the mount() method accordingly, and referenced get_cache_base as the place where extensions are extracted to. More detailed suggestions for improvements are welcome. Regards, Vinay Sajip

On 31 January 2014 08:22, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
More detailed suggestions for improvements are welcome.
Thanks for this. I see no means to set the cache that will be used by wheel.mount. Are you meant to set the global distlib.resources.cache value to a user-created Cache instance if you want to control the location of the cache? Is the default cache created on demand? In other words, if I set up my own cache on application startup, will the %LOCALAPPDATA%\.distlib directory still get created? The reason I ask is that on Windows, there are people who are interested in "portable" applications, which have as near to zero impact on the system as a whole as possible. While I don't have a need to write such applications in Python right now, I am working a lot with such applications at the moment, and I'm more aware than usual of questions such as "does this app leave traces of its presence on the system?" So I don't have an agenda here, I'm just curious. Paul

On Fri, 31/1/14, Paul Moore <p.f.moore@gmail.com> wrote:
Thanks for this. I see no means to set the cache that will be used by wheel.mount.
It's not as configurable as the distlib.resources cache (needs a method to be overridden, which isn't ideal), but I'll look at making it follow the same scheme.
Are you meant to set the global distlib.resources.cache value to a user- created Cache instance if you want to control the location of the cache?
Yes.
Is the default cache created on demand? In other words, if I set up my own cache on application startup, will the %LOCALAPPDATA%\.distlib directory still get created?
No - currently it's created on module import by a line 'cache = Cache()'. This can easily be changed to defer the cache creation until it's needed, allowing user code to set a custom Cache. When I do this, no .distlib directory will be created in %LOCALAPPDATA% unless no other cache has been set, and it's needed.
So I don't have an agenda here, I'm just curious.
I hope that answers things! I will update distlib's cache usage logic shortly to (a) allow better control over the wheel mount cache location and (b) create caches on demand rather than on module import. Regards, Vinay Sajip

FWIW, Windows (by default) has a regular maintenance task that will clean up old files in the TEMP directory. I think the default settings will delete files older than 30 days and more aggressively if disk space is running low. I'd say pick a consistent/static subfolder ('wheel_mount_35_amd64' or something), autogenerate whatever is needed within there, and leave it behind. Users who are concerned can rm -rf whenever they like and everyone else can let the OS handle it. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Paul Moore<mailto:p.f.moore@gmail.com> Sent: 1/30/2014 14:52 To: Nick Coghlan<mailto:ncoghlan@gmail.com> Cc: DistUtils mailing list<mailto:distutils-sig@python.org> Subject: Re: [Distutils] distlib.mount API design (was: wheels on sys.path clarification (reboot)) On 30 January 2014 22:38, Nick Coghlan <ncoghlan@gmail.com> wrote:
The advantage of wheels over plain zipfiles for this use case is the structured metadata. distlib.mount doesn't try to guess the package structure for the extensions, you have to provide an EXTENSIONS file in the metadata that explains what C extensions are present and how they should map into the module namespace.
OK, I think I get the idea now. I'm still not comfortable with the temp directory clutter that unpacking leaves (in particular on Windows where deletion isn't even possible in an atexit routine) but I'll survive. I *would* like to see the various technical issues and implications in the API documentation, though. The implications and limitations, and in particular the manual cache management requirements, need to be made explicit. (I thought I'd seen docs somewhere, but they definitely aren't in the API reference for the distlib.wheel module). Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
participants (4)
-
Nick Coghlan
-
Paul Moore
-
Steve Dower
-
Vinay Sajip