[Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm

PJ Eby pje at telecommunity.com
Thu Mar 14 17:39:44 CET 2013


On Thu, Mar 14, 2013 at 6:07 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 12.03.2013 22:26, PJ Eby wrote:
>> On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 12.03.2013 19:15, M.-A. Lemburg wrote:
>>>> I've run into a weird issue with easy_install, that I'm trying to solve:
>>>>
>>>> If I place two files named
>>>>
>>>> egenix_mxodbc_connect_client-2.0.2-py2.6.egg
>>>> egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip
>>>>
>>>> into the same directory and let easy_install running on Linux
>>>> scan this, it considers the second file for Windows as best
>>>> match.
>>>>
>>>> Is the algorithm used for determining the best match documented
>>>> somewhere ?
>>>>
>>>> I've had a look at the implementation, but this left me rather
>>>> clueless.
>>>>
>>>> I thought that setuptools would prefer the .egg file over
>>>> the prebuilt .zip file - binary files being easier to install
>>>> than "source" files.
>>>
>>> After some experiments, I found that the follow change
>>> in filename (swapping platform and python version, in addition
>>> to use '-' instead of '.) works:
>>>
>>> egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip
>>>
>>> OTOH, this one doesn't (notice the difference ?):
>>>
>>> egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip
>>>
>>> The logic behind all this looks rather fragile to me.
>>
>> easy_install only guarantees sane version parsing for distribution
>> files built using setuptools' naming algorithms.  If you use
>> distutils, it can only make guesses, because the distutils does not
>> have a completely unambiguous file naming scheme.  And if you are
>> naming the files by hand, God help you.  ;-)
>
> The problem appears to be a bug in setuptools' package_index.py.
>
> The function interpret_distro_name() creates a set of possible
> separations of the found name into project name and version.
>
> It does find the right separation, but for some reason, the
> code using that function does not check the found project
> names against the project name the user is trying to install,
> but simply takes the last entry of the list returned by the
> above function.
>
> As a result, easy_install downloads and tries to install
> project files that don't match the project name in some
> cases.
>
> Here's another example where it fails (say you're on a x64 Linux box):
>
> # easy_install egenix-pyopenssl
>
> As example, say it finds these distribution files:
>
>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip',
>     'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg',
>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip',
>     'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip',
>
> It then creates different interpretations of those names, puts
> them in a list and sorts them. Here's the end of that list:
>
> egenix-pyopenssl; 0.13.1.1.0.1.5 <<-- this would be the correct .egg file
> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt
> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt
> egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt
> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt
> egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt
>
> It picks the last entry, which would be for a project called
> "egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx" - not the one
> the user searched.

Actually, that's not quite true.  It's picking:

egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt

Because it thinks that
'0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt' is a higher
version than 0.13.1.1.0.1.5.

It does also record the possibility you mentioned, but it doesn't pick
that one.  The project names actually *do* have to match.

If you open a ticket on the setuptools tracker, 'll try to see if I
can get it to recognize that strings like py2.7, macosx, ucs, and the
like are terminators for a version number.  I don't know how
successful I'll be, though.  Basically, those zip files are (I assume)
bdist_dumb distributions being taken for source distributions, and
easy_install doesn't actually support bdist_dumb files at the moment.


More information about the Catalog-SIG mailing list