[Distutils] pkg_resources get_distribution - zipfile support?

PJ Eby pje at telecommunity.com
Wed Oct 2 04:32:37 CEST 2013


On Tue, Oct 1, 2013 at 7:10 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> On 2 Oct 2013 07:12, "Paul Moore" <p.f.moore at gmail.com> wrote:
>>
>> On 1 October 2013 21:32, PJ Eby <pje at telecommunity.com> wrote:
>> > On Tue, Oct 1, 2013 at 1:51 PM, Daniel Holth <dholth at gmail.com> wrote:
>> >> pkg_resources only finds distributions inside .egg and inside sys.path
>> >> entries that are filesystem directories.
>> >
>> > Actually it looks in zipfiles (or subdirectory paths thereof) or
>> > filesystem directories, and can spot zipfile subdirectories named
>> > '.egg' inside zipfiles (or subdirectories thereof).
>>
>> But not dist-info? I thought setuptools supported dist-info formats
>> these days. Is this somewhere that got missed? Or is it more subtle
>> than that?
>
> I believe it's not recognising "*.whl" as interesting, so it isn't even
> looking inside the nested wheel to check for dist-info.

That would only be relevant if the .whl weren't on sys.path.  Since
it's on sys.path, it's processed by importer, not by filename.  (It's
just that there's no .dist-info detection in the zipimporter handler.)
 The .whl extension is only relevant for discovery of nested .whl's
(wheels within wheels...  or within eggs or plain zipfiles...), or
.whl's in directories.

There isn't any good terminology for describing this at the moment,
which makes all this sound much more complicated than it actually is.
;-)

Making up some new terminology, suppose we have foo.egg, bar.egg-info,
baz.dist-info, and spam.whl in site-packages.  Then the bar and baz
distributions are "mounted" at the '.../site-packages' location
string, but the foo and spam distributions are merely "discoverable"
at that same location string.  (They are *mounted* at
'.../site-packages/foo.egg' and '.../site-packages/spam.whl',
respectively.)

That is, to be mounted at a given location string means "to be
importable if that location string is on sys.path", and to be
discoverable at a given location means "to be available for dynamic
dependency resolution (e.g. via require()) if that location string is
on sys.path".

Determining what is mounted or discoverable at a given sys.path
location is the job of the find_distributions() API.  If the 'only'
flag is true, it yields only mounted distributions at the given
location string.  If false (the default), it yields both mounted and
discoverable distributions.

Behind the scenes, this is implemented by finding a handler for the
importer that the PEP 302 import protocol would use to look for
modules at the given location string, and then delegating the
operation to that handler.  The handler then has to look at the
location string and figure out what distributions are mounted and/or
discoverable there.

To find mounted distributions, the directory handler (find_on_path())
checks whether the directory string itself ends in '.egg' (and could
theoretically do the same for .whl), and also looks for contained
.dist-info and .egg-info files or directories.  To find mountable
distributions, it checks for files or directories ending in '.egg'
(and could theoretically do the same for .whl).

The zipfile handler (find_in_zip()) doesn't actually bother checking
for an .egg extension; instead it checks for an EGG-INFO/PKG-INFO and
assumes it'll be able to figure things out from that.  And it checks
for nested .eggs if it's looking for discoverables.

So, what it's missing to support Paul's use case is a check for
.dist-info/METADATA, analagous to the EGG-INFO/PKG-INFO check.  It
should be relatively simple to add, if somebody wants to do that.  (It
can even be done from code outside pkg_resources, as there is a
'register_finder()' API that can be called to register a replacement
handler.)

In some ways, I'm finding the code structure irritating now, because
the one abstraction I *didn't* build into the original framework was a
concept that there would ever be competing formats to .egg and
.egg-info, so implementing .dist-info and .whl requires annoying
repetitions of code at the moment.  But it's probably not worth
refactoring to make this cleaner, because the odds that there will be
a *third* file format needing to be supported any time soon are
hopefully quite small.  ;-)


More information about the Distutils-SIG mailing list