[Distutils] Optional C extensions in packages

Jim Fulton jim at zope.com
Fri Feb 2 14:11:20 CET 2007


Phillip J. Eby wrote:
> At 05:32 PM 2/1/2007 -0500, Jim Fulton wrote:
>> I'm still worried about the ambiguous case when there are both
>> platform-dependent and platform-independent eggs installed.
> 
> How would this happen?

At least in a couple of ways.

1. As I mentioned in my previous note, when a package has optional
    extensions, one will often want to disable the extensions for
    debugging purposes.  It is easier debugging Python code than
    C code, especially in combination with other Python code.  In the
    past, this was typically done by removing .so (or .pyd) files.
    This can still be done with eggs, but I thik it will be attractive
    to do this by selecting diffeent eggs.

2. Consider the following scenario: Someone has a mac without a development
    environment installed.  They install some eggs and get versions without
    extensions.  Later, they install the development tools that came on the CD
    with their mac.  How do they reinstall the eggs with extensions?  If they
    install in multi-version mode, won't they have a mix of eggs with and
    without extensions?

> I think you're trying to solve a broader problem than the one I'm trying 
> to solve, which is that I'd like to make it possible for people who 
> don't have working compilers (i.e. mostly Windows, with some Mac users 
> and some people in virtual hosting environments) to install packages 
> that contain C extensions.

I'm trying to avoid a problem I think you may create.  As soon as there
can be two eggs that satisfy the same requirements but with different
semantics, I think there is a problem.  I understand that in the use
case you are thinking of, this would normally not happen, but it still
can happen and I suggest will happen.

 > In that scenario, you're going to *always* want to use this option to
 > suppress optional extensions, because there isn't any way for you to
 > build them.  But, you would presumably still want to know about packages
 > that *require* their extensions to be built.

>> I think you were proposing an easy_install option.  This helps when
>> someone installs a distribution directly, but doesn't help when a
>> distribution is installed as a dependency.
> 
> This would be an option to suppress compiling *all* optional C 
> extensions, period.

So it would apply to dependencies as well. Yeah, that makes sense.

>>   It also doesn't help with
>> controlling selection of eggs after installation.  And I think it
>> doesn't make it easy to change one's mind.  For example, one might
>> install an egg with extensions and then install one without
>> extensions to debug a problem using the Python debugger.  Would the
>> option let them do that?
> 
> The idea was that it would be a build-time option.

Will it be possible to reinstall eggs with the same versions but with
different choices wrt optional extensions?  I guess even if it isn't
supported by easy_install, I could make it work with buildout.

>> Is it possible to control this as part of the requirement specification?
>> Perhaps this could be some kind of standard extra?
>>
>> I'd strongly prefer to be able to control this via the requirements
>> mechanism.  I'd like to be able to say that I want or don't want
>> extensions as part of a requirement string.
> 
> Yeah, I see the benefit of that, certainly.  The problem is that we're 
> trying to solve different problems.  I just want to make it *possible* 
> to suppress building extensions during easy_install.

I want that too, certainly.

> I'll give some more thought to what you're asking for. 

Super.

 > I have an
> inkling of an idea, but the problems have to do with things like having 
> to actually check the egg's contents to see if it meets requirements, 
> and there are problems regarding the need to clean up the build/ 
> directory if you change what features you build something with.
> 
> You see, setuptools has an undocumented 'feature' mechanism (which is 
> still used by some PEAK projects) to control the inclusion of various 
> packages, extensions, etc.  The main reason this is undocumented is 
> because it turns out that it's fragile to specify what features to use 
> or not use on the command line alone, due to some distutils' commands 
> just taking whatever's in the build/ directory as gospel.
> 
> Anyway, that feature mechanism could probably be tied in to the 
> requirements system, as long as there was a way to wipe the build/ 
> directory whenever the features changed between runs of setup.py, and 
> there was a way to list the features in the .egg-info, and pkg_resources 
> was changed to query a distribution's "features" info when validating a 
> requirement that includes "extras".
> 
> I'm a little concerned that this will incur additional disk access under 
> various circumstances, unless there is some way to statically 
> distinguish between extras that denote "features" and ones that indicate 
> additional requirements.  Of course, matching a requirement against a 
> distribution when the requirement doesn't list any extras, will not 
> incur overhead.
>
 > I guess we could do something like this for 0.7.  One thing that
 > concerns me, however, is that it potentially *increases* the amount of
 > conflicts and confusion possible regarding a single egg, unless there's
 > a way to include the features in the filename.  You can't tell just by
 > looking at it, if it meets your needs.

Yup. (I think this is related to the 2-byte/4-byte unicode issue.)

Are there so many of these potential features that we couldn't reflect them
in the file name?

In the specific case of the presence or absence of extensions, that
is already part of the file name.  Eggs with extensions will
have the platform reflected in the file name.  Eggs without won't,
so it should be easy to tell them apart.

> In contrast, the benefit of my current proposal is that it's intended 
> strictly for those circumstances where the eggs are *supposed to be* 
> interchangeable except for platform-specificity and performance, and you 
> should be able to at least tell from the filename which kind you have.  

As I mention above, you'll be able to easily distinguish platform-specific
and platform-independent eggs apart based on their file names.

> In the case where we allow other choices of features, you would need 
> some kind of tool to tell you what features the egg was built with.

In the general case, yes, unless you reflected the features in the
file name.  In the specific case of extensions, I don't think this
is a problem.

> Maybe another possibility is to have *subprojects* instead, where a 
> subproject is something built using the same setup.py, but has a 
> distinct project name, like "PyProtocols-CExtensions" or "Twisted-Foo".  
> By default, perhaps such a multi-project setup script would run each 
> subproject with its own build directory, and dump multiple eggs or 
> source distributions into the dist/ directory. 

Yup.  And this could be automated triggered easily using minimal
meta-data in the setup file.

> This might take some 
> munging of EasyInstall to support picking up the distributions produced 
> when running the bdist_egg, but it might be doable.
 >
> The principal downsides to this approach are the doubling up of eggs 
> involved, and the need to keep a precise match of versions between the 
> packages.  In particular, if someone installs a new version of a package 
> without its C extensions, and the C extensions still exist for an older 
> version, it will end up importing the wrong extensions -- and it will be 
> hard to tell what happened and why.  The package will just seem broken.

I see your point.  This arises from the way that easy_install incrementally
installs distributions.  This potentially wouldn't be a problem for buildout,
but I wouldn't want to break easy_install (or workingenv).

> Sigh.  I guess at this point I don't really see a way to do optional 
> extensions that doesn't turn into a crazy madhouse of support later.  It 
> seems to me that at least the problems with my approach would at most 
> boil down to, "how come this thing is so slow"?  :)

OK, so based on this discussion, I'm in favor of your original proposal
as a start.  I think there should be a way to cause building/installation
of a platform-dependent egg even if there is a platform-independent egg
with the same installed already, and the other way around, to deal with
the use cases I described  earlier.  Even in multiple-version mode,
this is not a problem, because the eggs will have different file names.
I'd really *like* to be able to reflect the selection of these somehow
in requirement specifications,  but, if need be, this can be dealt with
at the tool (e.g. buildout)  level.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


More information about the Distutils-SIG mailing list