[Python-Dev] PEP 376 and PEP 302 - allowing import hooks to provide distribution metadata

Paul Moore p.f.moore at gmail.com
Sun Jul 5 15:27:31 CEST 2009


2009/7/5 Tarek Ziadé <ziade.tarek at gmail.com>:
> Agreed, the zip case was added afterwards, but in practice, the APIs are still
> dealing with the files are *filesystem files* located in a container (eg a directory
> or a zip file) located somewhere on the filesystem.
>
> "local" in that case is a flag that means "translate a file path expressed in the
> local filesystem" which make no sense anymore with zip files. But the goal really,
> is to be able to point out that two distributions are using the very same file.
>
> Right now PEP 376 and the prototype code handle these two real world use cases:
>
> - browsing regular site-packages-like directories
> - browsing site-packages-like directories, that are zipped.
>
> For example:
>
> - I have a  "packages.zip" file in /var/, wich is also in my sys.path.
> It contains a distribution "foo-1.0" that has the "roman.py" file in
> its root.  So the RECORD file located in "foo-1.0.egg-info" has a line
> starting with "roman.py,..."
>
> - Then if I install docutils 0.5 as a regular filesystem distribution,
> "roman.py" will be added in Python's site-packages.
>  and docutils-0.5.egg-info/RECORD will contain "roman.py,..." with
> the same hash.
>
> The local flag will return these paths:
>
> - /var/packages.zip/roman.py   <--- not a "real" path
> - /usr/local/lib/python2.6/site-packages/roman.py
>
> So removing the docutils distribution will be doable, because these
> paths are different.
>
>>
>> Concrete proposal:
>>
>> get_metadata_files() - returns slash-separated names, relative to the
>> egginfo dir
>> get_metadata_file(path) - path must be slash-separated, relative to
>> the egginfo dir
>>
>> get_installed_files - returns the contents of RECORD unaltered
>> uses(path) - checks if path is in RECORD
>>
>> The latter 2 are not very useful in practice - you can't say anything
>> about entries in different RECORD files, which is likely the real use
>> case you want. Maybe RECORD could have an extra "Location" entry,
>> which determines where it exists globally (this would be the directory
>> to which the filenames were relative, in the case of filesystem-based
>> distributions) and RECORD entries are comparable if the Location
>> values in the 2 RECORD files match. That's a lot more complex - but
>> depending on what use people expect to make of these 2 APIs, it may be
>> justified.
>
> Yes,
> In practice, if you look at my previous example, even if
> "/var/packages.zip/roman.py" isn't a
> real path, it's enough to compare RECORD entries globally.
>
> The "Location" entry you are proposing in that case, would be
> "/var/packages.zip".
>
> But do we really need to store it the RECORD  ? Or can't we define an
> API that returns
> two elements :
>
> - the path to the location (in the example: /var/packages.zip or
> /usr/local/lib/python2.6/site-packages)
> - the path within the location itself (in the example: roman.py)
>
> A concrete proposal would be to take back your proposal, but return
> tuples with the location as the first member.
> e.g. "(location, relative path[s])"

That sounds reasonable. So we can forget the "local" parameter, and
return a tuple:

- absolute location of the container (directory, zipfile or whatever
containing the egginfo file) as a filesystem path in canonical native
form (where it's filesystem based) or as an opaque token for the odd
cases (frozen modules, for example) where a filesystem location isn't
available.
- entry from the RECORD file, as a slash-separated filename relative
to the root of the container.

> The code that is comparing paths to see if they are the same can join
> location+relative path[s], while we can provide in a dedicated function
> something to read the content of the file (that would be get_data I guess,
> if I refer to PEP 302)

Unfortunately, get_data loads data files located within a *package*,
using a name relative to the package directory. You can't get at the
metadata of a *distribution* like that.

But if you're using get_installed_files(), why would you then want to
read the files? What exactly would you *use* get_installed_files for
which would then leave you needing to read the files? If it's to check
they haven't changed (by comparing md5 values) you're doing that to
uninstall, so that's the responsibility of the uninstall function.

Again, it's a question of what is a public API, and what is the use
case it's designed for.

I'm currently writing a SQLite importer, which will allow me to store
"files" in any sort of database tables I want, so I can build in some
nice pathological behaviour. That should tease out some awkward corner
cases :-)

Paul


More information about the Python-Dev mailing list