2009/7/4 Paul Moore firstname.lastname@example.org:
2009/7/3 Tarek Ziadé email@example.com:
You can give me a bitbucket account so I can give you write access to the repo, There are tests as long as you install Nose.
How do I get the tests to work? Just running nosetests gives an error (probably because pkgutil is being imported from the stdlib, rather than from this directory).
If I set PYTHONPATH=. then I get errors. I suspect path normalisation (for backslashes) in the zipfile handling.
Actually, the test
assert_equals(list(dist.get_egginfo_files(local=True)), [os.path.join(SITE_PKG, 'mercurial-1.0.1.egg-info/PKG_INFO'), os.path.join(SITE_PKG, 'mercurial-1.0.1.egg-info/RECORD')])
is broken, because the expected value uses slashes, which are *not* the local separator on win32.
I've attached a patch.
But there's 2 comments I'd make (one minor, one major)
Minor one: The tests often seem to be exercising the internal classes, not so much the public API, so many of them will probably not be of much use to me :-(
Major one: I'm not entirely sure what the purpose of the "local" flag is. With local=True, you are supposed to get a filesystem path in the local form. But that makes no sense for non-filesystem based loaders (e.g. zipfiles). So how is a user expected to use the value? With local=false, you get slash-separated files (which is good, because it's consistent) which are either absolute (see my earlier comments about how broken this is in practice) or relative to the "directory" containing the egg-info file (where this is of course not necessarily a real filesystem location). But these aren't useful as filesystem names, because they aren't local format.
So what is the point of the get_installed_files, uses, and get_egginfo_files functions? Who will use the returned values, and what for? Where will a user get a value to pass to uses()? Given that get_egginfo_file takes names relative to the egginfo dir, shouldn't get_egginfo_files *return* values relative to the same location, so that they can be used there?
I guess the same type of argument applies with get_inetalled_files() and uses() - the values returned by get_installed_files can be seen as a set of opaque tokens (OK, realistically they will be "filenames" in some sense) which can be passed to uses() - but if they aren't absolute filesystem paths, you can't compare them. Example:
distribution a contains file p.py distribution b contains file p.py
pkgutil.get_distribution(a).get_installed_files contains 'p.py' pkgutil.get_distribution(b).uses('p.py') is True
What does this mean? If the 2 distributions are in the same sys.path element, there's a clash. If not, is there a clash? What if they are in 2 different directories? What if they are in the same directory, but it's repeated under different names (e.g. symlinks) on sys.path? What if one is in a directory and the other in a zip file?
This isn't even related to non-filesystem loaders. It's a fundamental issue with the whole API. What you seem to want is actually a canonical "name" for a file object - for filesystem files, os.path.normcase(os.path.normpath(filename)) is probably good enough, although it doesn't deal with symlinks. For other types of loader, you have to rely on the loader itself - loader.get_fullname() is probably OK. But even then, it's not clear how user code would actually *use* these APIs.
I think you need some real-world use cases, with actual sample (pseudo-)code, to validate the design here. As things stand, it's both confusing and (I suspect) unusable in practice. Sorry, I know that sounds negative, but if this isn't to be a source of subtle bugs for years to come, it needs to be clarified now. PEP 302 is still hitting this type of issue - runpy and importlib have brought out errors and holes in the protocol quite recently - even though Just and I went to great lengths to try to tease out hidden assumptions up front.
get_metadata_files() - returns slash-separated names, relative to the egginfo dir get_metadata_file(path) - path must be slash-separated, relative to the egginfo dir
get_installed_files - returns the contents of RECORD unaltered uses(path) - checks if path is in RECORD
The latter 2 are not very useful in practice - you can't say anything about entries in different RECORD files, which is likely the real use case you want. Maybe RECORD could have an extra "Location" entry, which determines where it exists globally (this would be the directory to which the filenames were relative, in the case of filesystem-based distributions) and RECORD entries are comparable if the Location values in the 2 RECORD files match. That's a lot more complex - but depending on what use people expect to make of these 2 APIs, it may be justified.