[Distutils] use of '_' in package name causing version parsing issue?

P.J. Eby pje at telecommunity.com
Thu Mar 11 16:43:02 CET 2010

At 12:38 PM 3/11/2010 +0530, Baiju M wrote:
>On Thu, Mar 11, 2010 at 11:05 AM, Baiju M <mbaiju at zeomega.com> wrote:
> > If "_" is a valid project_name identifier, why it is replaces with "-" ?

In order to have a canonicalized name form which can be escaped in 
filenames for unambiguous identification of an egg's project and version.

Egg filenames use '-' as a separator between name, version, python 
version, and platform.  A '-' in any of these components is escaped 
as '_', so that the '-' remains a viable and unambiguous 
separator.  This means that '_' gets turned back into a '-' when 
unescaped, so the mapping between '_' and '-' is part of the 
safe_name canonical form.

>There nearly 300 packages in PyPI with "_" in the package name.
>For all the packages built using Setuptools, the "Name" field in
>the PKG-INFO file is replaced with "-".
>I checked some of the packages built with "distutils.core" [1]
>Distutils is not replacing "Name" field in PKG-INFO file
>with "-".
>Why Setuptools is behaving different from Distutils ?

Because distutils wasn't built in a world where: package names needed 
to be uniquely and unambiguously machine-parseable from 
filenames.  The code that easy_install has for dealing with 
distutils-named source distributions has to guess at possible 
interpretations of those filenames, because distutils filenames don't 
distinguish between a '-' in a name or version, and a '-' *between* 
names and versions.

Ultimately, the simplest way to deal with this was to treat runs of 
'_' (or any other non-alphanum, non-dot character), as being 
identical to a single '-'.

>Buildout has a functionality to "pin-down" ("lock down"/"nail down") versions
>of eggs (distribution?).  There is another functionality to enforce
>versions of all eggs used in a particular Buildout configuration.  If we
>use "_" as the package name (distribution name?), this functionality is not

Your comparisons should be based on the 'key' attribute of 
Distribution and Requirement objects, rather than relying on direct 
string operations of your own.  The 'key' attribute contains a form 
of project name suitable for equality/inequality comparisons.

In other words, you should not take unparsed data from your 
configuration and compare it against pkg_resources attributes.  Use 
constructors like using Requirement.parse() and 
Distribution.from_filename() to create objects with 'key' attributes, 
then compare keys, or just use Requirement.__contains__.  For example:

     if someDistribution in Requirement.parse(projname+'=='+exactversion):
          # someDistribution is exactly version exactversion of projname

The pkg_resources API is there precisely so that you don't have to 
know all the low-level details like syntax rules and escaping.

