[Distutils] [Python-Dev] PEP 365 (Adding the pkg_resources module)
Phillip J. Eby
pje at telecommunity.com
Wed Mar 19 20:54:37 CET 2008
At 10:48 AM 3/19/2008 -0700, Guido van Rossum wrote:
>I don't understand PyPI all that well; it seems poor design that the
>browsing via keywords is emphasized but there is no easy way to
>*search* for a keyword (the list of all packages is not emphasized
>enough on the main page -- it occurs in the side bar but not in the
>main text). I assume there's a programmatic API (XML-RPC?) but I
>haven't found it yet.
http://wiki.python.org/moin/CheeseShopXmlRpc
There's also a REST API that setuptools uses:
http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
The API was originally designed for screen-scraping an older version
of PyPI, but that has been replaced with a "lite" version served from:
http://pypi.python.org/simple/
The "lite" version is intended for tools such as easy_install to
process, as it consists strictly of links and can be statically
cached. Zope Corp., for example, maintains a static mirror of this
API, to guard themselves against PyPI outages and slowdowns, since
their buildouts can involve huge numbers of eggs, both their own and
external dependencies.
>I'd love it if you could write or point me to code that takes a
>package name and optional version and returns the URL for the source
>archive, and the type (in case it can't be guessed from the filename
>or the Content-type header).
You can probably do that with the XML-RPC API. There's a function to
get the versions of a package, given a (case-sensitive) name, and
there's a function to get information for uploaded archives, given a
name and a version. I originally intended to use it for the PEP 365
approach, but you can get the necessary information in just one
static roundtrip using the REST (/simple) HTML API, if you're willing
to parse the URLs for version information. (The catch of course
being that distutils source distributions don't have unambiguously
parseable filenames.)
>Hm. Why not just use the existing convention for running setup.py
>after unpacking? This works great in my experience, and has the
>advantage of having an easy fallback if you end up having to do this
>manually for whatever reason.
Because I want bootstrap-ees to be able to use the bootstrap
mechanism. For example, I expect at some point that setuptools will
use other, non-self-contained packages, and other package managers
such as zc.buildout et al also want to depend on setuptools without
bundling it.
> > * calling the bootstrap module 'bootstrap', as in 'python -m
> > bootstrap projectname optionalversion'. The module would expose an
> > API to allow it to be used programmatically as well as the command
> > line, so that bootstrapped packages can use the bootstrap process to
> > locate dependencies if they so desire. (Today's package management
> > tools, at least, are all based on setuptools, so if it's not present
> > they'll need to download that before beginning their own
> > bootstrapping process.)
>
>This sounds like going beyond bootstrapping. My vision is that you use
>the bootstrap module (with the command line you suggest above) once to
>install setuptools or the alternate package manager of your choice,
>and then you can use easy_install (or whatever alternative) to install
>the rest.
Well, I noticed that the other package managers were writing
bootstrap scripts that then download setuptools' bootstrap script and
run it as part of *their* bootstrap process... and then I got to
thinking that it sure would be nice for setuptools to not have to be
a giant monolithic download if I wanted to start using other packages
in it... and that it sure would be nice to get rid of all these
bootstrap scripts downloading other bootstrap scripts... and then I
wrote PEP 365. :)
One other thing that PEP 365 does for these use cases that your
approach doesn't, is that pkg_resources could detect whether a
desired package of a usable version was *already* installed, and skip
it if so. So, we've already scaled back the intended use cases quite
a bit, as people will have to write their own "is it already there?"
and "is it the right version?" checks.
> > Without one or the other, the bootstrap tool would have to grow a
> > version parsing scheme of some type, and play guessing games with
> > file extensions. (Which is one reason I limited PEP 365's scope to
> > downloading eggs actually *uploaded* to PyPI, rather than arbitrary
> > packages *linked* from PyPI.)
>
>There are two version parsers in distutils, referenced by PEP 345, the
>PyPI 1.2 metadata standard.
Yes, and StrictVersion doesn't parse release candidates. And neither
LooseVersion nor StrictVersion supports handling multiple
pre/post-release tags correctly. (E.g. "1.1a1dev-r2753")
> > So, if I had to propose something right now, I would be inclined
> to propose:
> >
> > * using setuptools' version parsing semantics for interpretation of
> > alpha/beta/dev/etc. releases
>
>Can you point me to the code for this? What is its advantage over
>distutils.version?
It implements version comparison semantics that are closer to
programmer expectations. It has also been far more widely used and
exposed to more feedback. distutils.version, as far as I know, is
really only used by the PEP 345 metadata standard -- which isn't used
by *any* automated tools as far as I know, and I'm not sure how many
packages bother declaring it.
In addition to alpha/beta/candidate/dev versions, it also supports
post-release (patchlevel) tags such as svn revision or date-based tags.
Here is the code; the docstring is actually longer than the bits that
do anything:
def parse_version(s):
"""Convert a version string to a chronologically-sortable key
This is a rough cross between distutils' StrictVersion and LooseVersion;
if you give it versions that would work with StrictVersion, then
it behaves
the same; otherwise it acts like a slightly-smarter LooseVersion. It is
*possible* to create pathological version coding schemes that will fool
this parser, but they should be very rare in practice.
The returned value will be a tuple of strings. Numeric portions of the
version are padded to 8 digits so they will compare numerically, but
without relying on how numbers compare relative to strings. Dots are
dropped, but dashes are retained. Trailing zeros between alpha segments
or dashes are suppressed, so that e.g. "2.4.0" is considered the same as
"2.4". Alphanumeric parts are lower-cased.
The algorithm assumes that strings like "-" and any alpha string that
alphabetically follows "final" represents a "patch level". So, "2.4-1"
is assumed to be a branch or patch of "2.4", and therefore "2.4.1" is
considered newer than "2.4-1", which in turn is newer than "2.4".
Strings like "a", "b", "c", "alpha", "beta", "candidate" and so on (that
come before "final" alphabetically) are assumed to be
pre-release versions,
so that the version "2.4" is considered newer than "2.4a1".
Finally, to handle miscellaneous cases, the strings "pre", "preview", and
"rc" are treated as if they were "c", i.e. as though they were release
candidates, and therefore are not as new as a version string that does not
contain them, and "dev" is replaced with an '@' so that it sorts
lower than
than any other pre-release tag.
"""
parts = []
for part in _parse_version_parts(s.lower()):
if part.startswith('*'):
if part<'*final': # remove '-' before a prerelease tag
while parts and parts[-1]=='*final-': parts.pop()
# remove trailing zeros from each series of numeric parts
while parts and parts[-1]=='00000000':
parts.pop()
parts.append(part)
return tuple(parts)
component_re = re.compile(r'(\d+ | [a-z]+ | \.| -)', re.VERBOSE)
replace = {'pre':'c', 'preview':'c','-':'final-','rc':'c','dev':'@'}.get
def _parse_version_parts(s):
for part in component_re.split(s):
part = replace(part,part)
if not part or part=='.':
continue
if part[:1] in '0123456789':
yield part.zfill(8) # pad for numeric comparison
else:
yield '*'+part
yield '*final' # ensure that alpha/beta/candidate are before final
To check a parse_version() value for stability, you can just loop
over it looking for any part <"*foo" where "foo" is the desired
minimum stability. That is, if you find a '*a' and you don't want
alphas, then this version's no good. This lets you also distinguish
between a beta that you might accept, from an in-development snapshot
of a beta, that you wouldn't.
>What's wrong with just running "setup.py install"? I'd rather continue
>existing standards / conventions. Of course, it won't work when
>setup.py requires setuptools;
Actually, it will, if the setup script uses the current ez_setup
bootstrapping method for setuptools.
However, I'd like to get *rid* of that bootstrapping method, and
replace it with this one. That's why I'd prefer that the bootstrap
approach use a different entry point for launching, and why I want
the module to expose an API, and why I don't really want the
bootstrapper to actually "install" anything.
For one thing, it means dealing with installation *options*. Your
prototype doesn't pass through any command-line options to the
script, so people would have to use a ~/.pydistutils.cfg file in
order to control the installation options, for example. (Which then
can break if the packager included a setup.cfg that was supposed to
be overridden on the command line...)
Probably this seems a lot more messy to me, because I've had my face
directly planted in the mess for a number of years now, and I know
that, for example, people bitched and moaned excessively about not
being able to use --prefix with easy_install, the way they could with
'setup.py install'.
And maybe my experiences aren't all relevant here; I'm just not very
good at turning them off. My skepticism for the setup.py-based
approach is at close to "new scheme for removing the GIL" level,
because I've gone through a lot of pain to get easy_install from the
stage where it looked a lot like your bootstrap prototype, to
something that actually works, most of the time, for arbitrary
distutils packages. :)
And unfortunately, some of the hurdles will require a few release
cycles to show up. And hey, if you're okay with that, cool. I just
think that as soon as it gets out in the field, people will use it
far outside anything we expect it to be used for, and if there's not
a bright line for the *packager* to cross, I think we'll have people
unhappy with the tool.
If you have to do a special step to make something bootstrappable,
then when the tool doesn't work, the user will ask the packager to
take the special step. However, if the tool allows the user to
*point* it at any package, and it randomly (from the user's POV)
fails, then the tool (and Python) will be blamed for the failure.
Because even though the bootstrap tool is "not a package manager", if
it's close enough to look like "a simpler easy_install", people will
try to use it as one, and blog about how bootstrap is broken and
should support installation options, etc.
(I suppose at this point easy_install is something of a
counter-example to this worry; people can and do now give packagers
patches to make their setup scripts more compatible with
easy_install, in cases where the package does extensive distutils
modification. OTOH, easy_install is a de facto standard, where
bootstrap will be de jure. What does that mean in practice? Heck if
I know. :) I guess people will hate on you instead of me, then, so
maybe I should view that as an improvement. :) (It also makes it
easier to understand your reluctance to be in any way associated with
eggs, but there's a big difference between eggs and easy_install, and
IMO your approach leans more towards the relative vices of
easy_install than the relative virtues of eggs. But oh well.))
More information about the Distutils-SIG
mailing list