[Python-Dev] PEP 376 : Changing the .egg-info structure

P.J. Eby pje at telecommunity.com
Tue May 19 22:36:40 CEST 2009


At 04:04 PM 5/19/2009 +0200, Tarek Ziadé wrote:
>On Sat, May 16, 2009 at 6:55 PM, P.J. Eby <pje at telecommunity.com> wrote:
> >
> > 1. Why ';' separation, instead of tabs as in PEP 262?  Aren't semicolons a
> > valid character in filenames?
>
>I am changing this into a <tab>. for now.
>
>What about Antoine's idea about doing a quote() on the names ?

I like the CSV idea better, since the csv module is available in 2.3 
and up.  We should just pick a dialect with unambiguous quoting rules.


> From my point of view <tabs> seems more simple to deal with, if 3rd-party
>tools want to work on these files without using pkgutil or Python.

True, but then CSV files are still pretty common.

One other possibility that might work is using a vertical bar as a separator.

My preference rank at the moment is probably tabs, CSV, or vertical 
bar.  But I don't really care all that much, so let the people who care decide.

Personally, though, I don't see much point to cross-language 
manipulation of the file.  System packaging tools have their own way 
of keeping track of this stuff.  So unless somebody's using it to 
*build* system packages (e.g. making an RPM builder), they don't need this.

Now, about the APIs...

> > 4. There should probably be a way to iterate over the projects in a
> > directory, since it's otherwise impossible for an installation tool to find
> > out what project(s) "own" a file that conflicts with something being
> > installed.  Alternatively, reshaping the file API to allow querying by path
> > as well as by project might work.
>
>I am adding a "get_projects" api:
>
>   get_projects() -> iterator
>
>   Provides an iterator that will return (name, path) tuples, where `name`
>   is the name of a registered project and `path` the path to its `egg-info`
>   directory.
>
>But for the use case you are mentioning, what about an explicit API:
>
>   get_owners(paths) -> sequence of project names
>
>   returns a sequence of tuple. For each path in the "paths" list, a
>tuple of project names
>   is returned
>
> >
> > 5. If any cache mechanisms are to be used by the API, the API 
> *must* make it
> > possible to bypass or explicitly manage that cache, as otherwise
> > installation tools and tools that manipulate sys.path at runtime may end up
> > using incorrect data.
>
>work in progress - (I am afraid I have to write an advanced prototype
>to be able to know
>exaclty how the cache might work, and so, what API we should have)

I think it would be simpler to have explicit object types 
representing things like a directory, a collection of directories, 
and individual projects, and these object types should be part of the API.

Any function-oriented API should just be exposed as the methods of a 
default singleton.  Other Python modules follow this pattern -- and 
it's what I copied for the pkg_resources design.  It gives a nice 
tradeoff between keeping the simple things simple, and complex things 
possible, as well as keeping mechanism and policy separate.

Right now, the API design you're trying to do is being burdened by 
using strings and tuples to represent things that could just as 
easily be objects with their own methods, instead of things you have 
to pass back into other APIs.  This also makes caching more complex, 
because you can't just have one main object with stuff hanging off; 
you've got to have a bunch of dictionaries, tuples, lists, sets, etc.



More information about the Python-Dev mailing list