[Python-Dev] PEP 376 and PEP 302 - allowing import hooks to provide distribution metadata

Tarek Ziadé ziade.tarek at gmail.com
Sun Jul 5 15:02:49 CEST 2009


2009/7/4 Paul Moore <p.f.moore at gmail.com>:
> 2009/7/4 Paul Moore <p.f.moore at gmail.com>:
>> 2009/7/3 Tarek Ziadé <ziade.tarek at gmail.com>:
>>> You can give me a bitbucket account so I can give you write access to the repo,
>>> There are tests as long as you install Nose.
>>
>> How do I get the tests to work? Just running nosetests gives an error
>> (probably because pkgutil is being imported from the stdlib, rather
>> than from this directory).
>>

I just run them from within the directory

>> If I set PYTHONPATH=. then I get errors. I suspect path normalisation
>> (for backslashes) in the zipfile handling.

>
> Actually, the test
>
>    assert_equals(list(dist.get_egginfo_files(local=True)),
>                  [os.path.join(SITE_PKG, 'mercurial-1.0.1.egg-info/PKG_INFO'),
>                   os.path.join(SITE_PKG, 'mercurial-1.0.1.egg-info/RECORD')])
>
> is broken, because the expected value uses slashes, which are *not*
> the local separator on win32.
>
> I've attached a patch.

Applied, thanks (I didn't run them under win32 yet)


>
> But there's 2 comments I'd make (one minor, one major)
>
> Minor one: The tests often seem to be exercising the internal classes,
> not so much the public API, so many of them will probably not be of
> much use to me :-(

I'll add some more tests then, or even user stories.

> I think you need some real-world use cases, with actual sample
> (pseudo-)code, to validate the design here. As things stand, it's both
> confusing and (I suspect) unusable in practice. Sorry, I know that
> sounds negative, but if this isn't to be a source of subtle bugs for
> years to come, it needs to be clarified now. PEP 302 is still hitting
> this type of issue - runpy and importlib have brought out errors and
> holes in the protocol quite recently - even though Just and I went to
> great lengths to try to tease out hidden assumptions up front.

Agreed, the zip case was added afterwards, but in practice, the APIs are still
dealing with the files are *filesystem files* located in a container
(eg a directory or a zip file)
located somewhere on the filesystem.

"local" in that case is a flag that means "translate a file path
expressed in the local filesystem"
which make no sense anymore with zip files. But the goal really, is to
be able to point out
that two distributions are using the very same file.

Right now PEP 376 and the prototype code handle these two real world use cases:

- browsing regular site-packages-like directories
- browsing site-packages-like directories, that are zipped.

For example:

- I have a  "packages.zip" file in /var/, wich is also in my sys.path.
It contains a distribution "foo-1.0" that has the "roman.py" file in
its root.  So the RECORD file located in "foo-1.0.egg-info" has a line
starting with "roman.py,..."

- Then if I install docutils 0.5 as a regular filesystem distribution,
"roman.py" will be added in Python's site-packages.
  and docutils-0.5.egg-info/RECORD will contain "roman.py,..." with
the same hash.

The local flag will return these paths:

- /var/packages.zip/roman.py   <--- not a "real" path
- /usr/local/lib/python2.6/site-packages/roman.py

So removing the docutils distribution will be doable, because these
paths are different.

>
> Concrete proposal:
>
> get_metadata_files() - returns slash-separated names, relative to the
> egginfo dir
> get_metadata_file(path) - path must be slash-separated, relative to
> the egginfo dir
>
> get_installed_files - returns the contents of RECORD unaltered
> uses(path) - checks if path is in RECORD
>
> The latter 2 are not very useful in practice - you can't say anything
> about entries in different RECORD files, which is likely the real use
> case you want. Maybe RECORD could have an extra "Location" entry,
> which determines where it exists globally (this would be the directory
> to which the filenames were relative, in the case of filesystem-based
> distributions) and RECORD entries are comparable if the Location
> values in the 2 RECORD files match. That's a lot more complex - but
> depending on what use people expect to make of these 2 APIs, it may be
> justified.

Yes,
In practice, if you look at my previous example, even if
"/var/packages.zip/roman.py" isn't a
real path, it's enough to compare RECORD entries globally.

The "Location" entry you are proposing in that case, would be
"/var/packages.zip".

But do we really need to store it the RECORD  ? Or can't we define an
API that returns
two elements :

- the path to the location (in the example: /var/packages.zip or
/usr/local/lib/python2.6/site-packages)
- the path within the location itself (in the example: roman.py)

A concrete proposal would be to take back your proposal, but return
tuples with the location as the first member.
e.g. "(location, relative path[s])"

The code that is comparing paths to see if they are the same can join
location+relative path[s], while
we can provide in a dedicated function something to read the content
of the file (that would be get_data I guess,
if I refer to PEP 302)

Tarek


More information about the Python-Dev mailing list