[Distutils] [Python-Dev] PEP 365 (Adding the pkg_resources module)

Wed Mar 19 20:54:37 CET 2008

At 10:48 AM 3/19/2008 -0700, Guido van Rossum wrote:
>I don't understand PyPI all that well; it seems poor design that the
>browsing via keywords is emphasized but there is no easy way to
>*search* for a keyword (the list of all packages is not emphasized
>enough on the main page -- it occurs in the side bar but not in the
>main text). I assume there's a programmatic API (XML-RPC?) but I
>haven't found it yet.

   http://wiki.python.org/moin/CheeseShopXmlRpc

There's also a REST API that setuptools uses:

   http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api

The API was originally designed for screen-scraping an older version 
of PyPI, but that has been replaced with a "lite" version served from:

   http://pypi.python.org/simple/

The "lite" version is intended for tools such as easy_install to 
process, as it consists strictly of links and can be statically 
cached.  Zope Corp., for example, maintains a static mirror of this 
API, to guard themselves against PyPI outages and slowdowns, since 
their buildouts can involve huge numbers of eggs, both their own and 
external dependencies.

>I'd love it if you could write or point me to code that takes a
>package name and optional version and returns the URL for the source
>archive, and the type (in case it can't be guessed from the filename
>or the Content-type header).

You can probably do that with the XML-RPC API.  There's a function to 
get the versions of a package, given a (case-sensitive) name, and 
there's a function to get information for uploaded archives, given a 
name and a version.  I originally intended to use it for the PEP 365 
approach, but you can get the necessary information in just one 
static roundtrip using the REST (/simple) HTML API, if you're willing 
to parse the URLs for version information.  (The catch of course 
being that distutils source distributions don't have unambiguously 
parseable filenames.)

>Hm. Why not just use the existing convention for running setup.py
>after unpacking? This works great in my experience, and has the
>advantage of having an easy fallback if you end up having to do this
>manually for whatever reason.

Because I want bootstrap-ees to be able to use the bootstrap 
mechanism.  For example, I expect at some point that setuptools will 
use other, non-self-contained packages, and other package managers 
such as zc.buildout et al also want to depend on setuptools without 
bundling it.

> >  * calling the bootstrap module 'bootstrap', as in 'python -m
> >  bootstrap projectname optionalversion'.  The module would expose an
> >  API to allow it to be used programmatically as well as the command
> >  line, so that bootstrapped packages can use the bootstrap process to
> >  locate dependencies if they so desire.  (Today's package management
> >  tools, at least, are all based on setuptools, so if it's not present
> >  they'll need to download that before beginning their own
> >  bootstrapping process.)
>
>This sounds like going beyond bootstrapping. My vision is that you use
>the bootstrap module (with the command line you suggest above) once to
>install setuptools or the alternate package manager of your choice,
>and then you can use easy_install (or whatever alternative) to install
>the rest.

Well, I noticed that the other package managers were writing 
bootstrap scripts that then download setuptools' bootstrap script and 
run it as part of *their* bootstrap process...  and then I got to 
thinking that it sure would be nice for setuptools to not have to be 
a giant monolithic download if I wanted to start using other packages 
in it...  and that it sure would be nice to get rid of all these 
bootstrap scripts downloading other bootstrap scripts...  and then I 
wrote PEP 365.  :)

One other thing that PEP 365 does for these use cases that your 
approach doesn't, is that pkg_resources could detect whether a 
desired package of a usable version was *already* installed, and skip 
it if so.  So, we've already scaled back the intended use cases quite 
a bit, as people will have to write their own "is it already there?" 
and "is it the right version?" checks.

> >  Without one or the other, the bootstrap tool would have to grow a
> >  version parsing scheme of some type, and play guessing games with
> >  file extensions.  (Which is one reason I limited PEP 365's scope to
> >  downloading eggs actually *uploaded* to PyPI, rather than arbitrary
> >  packages *linked* from PyPI.)
>
>There are two version parsers in distutils, referenced by PEP 345, the
>PyPI 1.2 metadata standard.

Yes, and StrictVersion doesn't parse release candidates.  And neither 
LooseVersion nor StrictVersion supports handling multiple 
pre/post-release tags correctly.  (E.g. "1.1a1dev-r2753")

> >  So, if I had to propose something right now, I would be inclined 
> to propose:
> >
> >  * using setuptools' version parsing semantics for interpretation of
> >  alpha/beta/dev/etc. releases
>
>Can you point me to the code for this? What is its advantage over
>distutils.version?

It implements version comparison semantics that are closer to 
programmer expectations.  It has also been far more widely used and 
exposed to more feedback.  distutils.version, as far as I know, is 
really only used by the PEP 345 metadata standard -- which isn't used 
by *any* automated tools as far as I know, and I'm not sure how many 
packages bother declaring it.

In addition to alpha/beta/candidate/dev versions, it also supports 
post-release (patchlevel) tags such as svn revision or date-based tags.

Here is the code; the docstring is actually longer than the bits that 
do anything:

def parse_version(s):
     """Convert a version string to a chronologically-sortable key

     This is a rough cross between distutils' StrictVersion and LooseVersion;
     if you give it versions that would work with StrictVersion, then 
it behaves
     the same; otherwise it acts like a slightly-smarter LooseVersion. It is
     *possible* to create pathological version coding schemes that will fool
     this parser, but they should be very rare in practice.

     The returned value will be a tuple of strings.  Numeric portions of the
     version are padded to 8 digits so they will compare numerically, but
     without relying on how numbers compare relative to strings.  Dots are
     dropped, but dashes are retained.  Trailing zeros between alpha segments
     or dashes are suppressed, so that e.g. "2.4.0" is considered the same as
     "2.4". Alphanumeric parts are lower-cased.

     The algorithm assumes that strings like "-" and any alpha string that
     alphabetically follows "final"  represents a "patch level".  So, "2.4-1"
     is assumed to be a branch or patch of "2.4", and therefore "2.4.1" is
     considered newer than "2.4-1", which in turn is newer than "2.4".

     Strings like "a", "b", "c", "alpha", "beta", "candidate" and so on (that
     come before "final" alphabetically) are assumed to be 
pre-release versions,
     so that the version "2.4" is considered newer than "2.4a1".

     Finally, to handle miscellaneous cases, the strings "pre", "preview", and
     "rc" are treated as if they were "c", i.e. as though they were release
     candidates, and therefore are not as new as a version string that does not
     contain them, and "dev" is replaced with an '@' so that it sorts 
lower than
     than any other pre-release tag.
     """
     parts = []
     for part in _parse_version_parts(s.lower()):
         if part.startswith('*'):
             if part<'*final':   # remove '-' before a prerelease tag
                 while parts and parts[-1]=='*final-': parts.pop()
             # remove trailing zeros from each series of numeric parts
             while parts and parts[-1]=='00000000':
                 parts.pop()
         parts.append(part)
     return tuple(parts)

component_re = re.compile(r'(\d+ | [a-z]+ | \.| -)', re.VERBOSE)
replace = {'pre':'c', 'preview':'c','-':'final-','rc':'c','dev':'@'}.get

def _parse_version_parts(s):
     for part in component_re.split(s):
         part = replace(part,part)
         if not part or part=='.':
             continue
         if part[:1] in '0123456789':
             yield part.zfill(8)    # pad for numeric comparison
         else:
             yield '*'+part

     yield '*final'  # ensure that alpha/beta/candidate are before final

To check a parse_version() value for stability, you can just loop 
over it looking for any part <"*foo" where "foo" is the desired 
minimum stability.  That is, if you find a '*a' and you don't want 
alphas, then this version's no good.  This lets you also distinguish 
between a beta that you might accept, from an in-development snapshot 
of a beta, that you wouldn't.

>What's wrong with just running "setup.py install"? I'd rather continue
>existing standards / conventions. Of course, it won't work when
>setup.py requires setuptools;

Actually, it will, if the setup script uses the current ez_setup 
bootstrapping method for setuptools.

However, I'd like to get *rid* of that bootstrapping method, and 
replace it with this one.  That's why I'd prefer that the bootstrap 
approach use a different entry point for launching, and why I want 
the module to expose an API, and why I don't really want the 
bootstrapper to actually "install" anything.

For one thing, it means dealing with installation *options*.  Your 
prototype doesn't pass through any command-line options to the 
script, so people would have to use a ~/.pydistutils.cfg file in 
order to control the installation options, for example.  (Which then 
can break if the packager included a setup.cfg that was supposed to 
be overridden on the command line...)

Probably this seems a lot more messy to me, because I've had my face 
directly planted in the mess for a number of years now, and I know 
that, for example, people bitched and moaned excessively about not 
being able to use --prefix with easy_install, the way they could with 
'setup.py install'.

And maybe my experiences aren't all relevant here; I'm just not very 
good at turning them off.  My skepticism for the setup.py-based 
approach is at close to "new scheme for removing the GIL" level, 
because I've gone through a lot of pain to get easy_install from the 
stage where it looked a lot like your bootstrap prototype, to 
something that actually works, most of the time, for arbitrary 
distutils packages.  :)

And unfortunately, some of the hurdles will require a few release 
cycles to show up.  And hey, if you're okay with that, cool.  I just 
think that as soon as it gets out in the field, people will use it 
far outside anything we expect it to be used for, and if there's not 
a bright line for the *packager* to cross, I think we'll have people 
unhappy with the tool.

If you have to do a special step to make something bootstrappable, 
then when the tool doesn't work, the user will ask the packager to 
take the special step.  However, if the tool allows the user to 
*point* it at any package, and it randomly (from the user's POV) 
fails, then the tool (and Python) will be blamed for the failure.

Because even though the bootstrap tool is "not a package manager", if 
it's close enough to look like "a simpler easy_install", people will 
try to use it as one, and blog about how bootstrap is broken and 
should support installation options, etc.

(I suppose at this point easy_install is something of a 
counter-example to this worry; people can and do now give packagers 
patches to make their setup scripts more compatible with 
easy_install, in cases where the package does extensive distutils 
modification.  OTOH, easy_install is a de facto standard, where 
bootstrap will be de jure.  What does that mean in practice?  Heck if 
I know.  :)  I guess people will hate on you instead of me, then, so 
maybe I should view that as an improvement.  :)  (It also makes it 
easier to understand your reluctance to be in any way associated with 
eggs, but there's a big difference between eggs and easy_install, and 
IMO your approach leans more towards the relative vices of 
easy_install than the relative virtues of eggs.  But oh well.))