Optional C extensions in packages
I'm thinking setuptools should allow you to specify that one or more C extensions in a project are optional. At minimum, there would then be a command-line option you could give to easy_install to say, "don't build any optional extensions", so if you don't have a compiler or need a cross-platform egg, you can skip those extensions. There would also need to be a way to specify this when using the regular build commands. This feature would address projects that include optional C versions of certain code, that can fall back to pure Python implementations. It would not address projects that have C extensions requiring external libraries that might or might not exist. (Such as Twisted, for instance, but Twisted falls into the category of projects requiring "extreme measures" to be supported anyway.) Comments, anyone?
Phillip J. Eby wrote:
I'm thinking setuptools should allow you to specify that one or more C extensions in a project are optional.
At minimum, there would then be a command-line option you could give to easy_install to say, "don't build any optional extensions", so if you don't have a compiler or need a cross-platform egg, you can skip those extensions.
There would also need to be a way to specify this when using the regular build commands.
This feature would address projects that include optional C versions of certain code, that can fall back to pure Python implementations.
It would not address projects that have C extensions requiring external libraries that might or might not exist. (Such as Twisted, for instance, but Twisted falls into the category of projects requiring "extreme measures" to be supported anyway.)
Comments, anyone?
I like the goal, since most of our extensions are in this category, however, I have a feeling that this use case might be better served by packaging the extensions as separate distributions. Once concern I have is that I think I'd want an egg with the extensions and the egg without the extensions to have different names, and perhaps different requirements. Of course, the names would be different because an egg with a C extension would be platform dependent and an egg without would be platform independent. What if both a platform-independent egg and a platform-dependent egg were available? Which would take precedence? I think they have the same precedence now. I think I'd be +1 if platform-dependent eggs were prefered over platform-independent ones. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 07:11 AM 2/1/2007 -0500, Jim Fulton wrote:
I like the goal, since most of our extensions are in this category, however, I have a feeling that this use case might be better served by packaging the extensions as separate distributions.
True, but without automation, it's more work for the package author to "do the right thing". If your extension is in a package, you have to make it a namespace package. You have to create another setup.py and directory structure, and manage another set of releases. This is a lot more work than just marking the extension as "optional".
Once concern I have is that I think I'd want an egg with the extensions and the egg without the extensions to have different names, and perhaps different requirements. Of course, the names would be different because an egg with a C extension would be platform dependent and an egg without would be platform independent. What if both a platform-independent egg and a platform-dependent egg were available? Which would take precedence? I think they have the same precedence now.
I think I'd be +1 if platform-dependent eggs were prefered over platform-independent ones.
...of the same version? That's easy enough to do by changing the Distribution.hashcmp property to put the 'platform' part earlier in the comparison. If they're different versions, however, it's a whole 'nother kettle of fish. The solution I was thinking of, however, has the additional benefit of working right now. If you're generating eggs for a package with optional extensions, you would simply *not* generate a cross-platform egg, but instead just ship an sdist tarball, plus platform-specific eggs for the platforms you care about. That way, anybody on an unsupported platform will end up building a local egg from the source, and either producing the extensions or skipping them locally. (And if you don't have access to a compiler, it seems unlikely that you will be then *distributing* the eggs you produced.) Notice that this setup is a natural side-effect of the way most people would produce and upload packages to the Cheeseshop anyway - you would have to make an intentional effort to suppress extension-building and upload a cross-platform egg; otherwise you will just end up with a source distribution and a platform-specific egg, which then works as intended.
On Feb 1, 2007, at 1:03 PM, Phillip J. Eby wrote:
At 07:11 AM 2/1/2007 -0500, Jim Fulton wrote:
I like the goal, since most of our extensions are in this category, however, I have a feeling that this use case might be better served by packaging the extensions as separate distributions.
True, but without automation, it's more work for the package author to "do the right thing".
Yes. It also makes it easier for the consumer to be explicit about what they want. Although maybe there are better ways.
If your extension is in a package, you have to make it a namespace package. You have to create another setup.py and directory structure, and manage another set of releases. This is a lot more work than just marking the extension as "optional".
Yes. See below...
Once concern I have is that I think I'd want an egg with the extensions and the egg without the extensions to have different names, and perhaps different requirements. Of course, the names would be different because an egg with a C extension would be platform dependent and an egg without would be platform independent. What if both a platform-independent egg and a platform-dependent egg were available? Which would take precedence? I think they have the same precedence now.
I think I'd be +1 if platform-dependent eggs were prefered over platform-independent ones.
...of the same version?
Yes
That's easy enough to do by changing the Distribution.hashcmp property to put the 'platform' part earlier in the comparison. If they're different versions, however, it's a whole 'nother kettle of fish.
Yup. Of course, that points up that the ambiguity remains, on some level. So I retract my precedence idea,
The solution I was thinking of, however, has the additional benefit of working right now. If you're generating eggs for a package with optional extensions, you would simply *not* generate a cross- platform egg, but instead just ship an sdist tarball, plus platform- specific eggs for the platforms you care about. That way, anybody on an unsupported platform will end up building a local egg from the source, and either producing the extensions or skipping them locally. (And if you don't have access to a compiler, it seems unlikely that you will be then *distributing* the eggs you produced.)
Notice that this setup is a natural side-effect of the way most people would produce and upload packages to the Cheeseshop anyway - you would have to make an intentional effort to suppress extension- building and upload a cross-platform egg; otherwise you will just end up with a source distribution and a platform-specific egg, which then works as intended.
I'm still worried about the ambiguous case when there are both platform-dependent and platform-independent eggs installed. I think you were proposing an easy_install option. This helps when someone installs a distribution directly, but doesn't help when a distribution is installed as a dependency. It also doesn't help with controlling selection of eggs after installation. And I think it doesn't make it easy to change one's mind. For example, one might install an egg with extensions and then install one without extensions to debug a problem using the Python debugger. Would the option let them do that? Is it possible to control this as part of the requirement specification? Perhaps this could be some kind of standard extra? I'd strongly prefer to be able to control this via the requirements mechanism. I'd like to be able to say that I want or don't want extensions as part of a requirement string. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 05:32 PM 2/1/2007 -0500, Jim Fulton wrote:
I'm still worried about the ambiguous case when there are both platform-dependent and platform-independent eggs installed.
How would this happen? I think you're trying to solve a broader problem than the one I'm trying to solve, which is that I'd like to make it possible for people who don't have working compilers (i.e. mostly Windows, with some Mac users and some people in virtual hosting environments) to install packages that contain C extensions. In that scenario, you're going to *always* want to use this option to suppress optional extensions, because there isn't any way for you to build them. But, you would presumably still want to know about packages that *require* their extensions to be built.
I think you were proposing an easy_install option. This helps when someone installs a distribution directly, but doesn't help when a distribution is installed as a dependency.
This would be an option to suppress compiling *all* optional C extensions, period.
It also doesn't help with controlling selection of eggs after installation. And I think it doesn't make it easy to change one's mind. For example, one might install an egg with extensions and then install one without extensions to debug a problem using the Python debugger. Would the option let them do that?
The idea was that it would be a build-time option.
Is it possible to control this as part of the requirement specification? Perhaps this could be some kind of standard extra?
I'd strongly prefer to be able to control this via the requirements mechanism. I'd like to be able to say that I want or don't want extensions as part of a requirement string.
Yeah, I see the benefit of that, certainly. The problem is that we're trying to solve different problems. I just want to make it *possible* to suppress building extensions during easy_install. I'll give some more thought to what you're asking for. I have an inkling of an idea, but the problems have to do with things like having to actually check the egg's contents to see if it meets requirements, and there are problems regarding the need to clean up the build/ directory if you change what features you build something with. You see, setuptools has an undocumented 'feature' mechanism (which is still used by some PEAK projects) to control the inclusion of various packages, extensions, etc. The main reason this is undocumented is because it turns out that it's fragile to specify what features to use or not use on the command line alone, due to some distutils' commands just taking whatever's in the build/ directory as gospel. Anyway, that feature mechanism could probably be tied in to the requirements system, as long as there was a way to wipe the build/ directory whenever the features changed between runs of setup.py, and there was a way to list the features in the .egg-info, and pkg_resources was changed to query a distribution's "features" info when validating a requirement that includes "extras". I'm a little concerned that this will incur additional disk access under various circumstances, unless there is some way to statically distinguish between extras that denote "features" and ones that indicate additional requirements. Of course, matching a requirement against a distribution when the requirement doesn't list any extras, will not incur overhead. I guess we could do something like this for 0.7. One thing that concerns me, however, is that it potentially *increases* the amount of conflicts and confusion possible regarding a single egg, unless there's a way to include the features in the filename. You can't tell just by looking at it, if it meets your needs. In contrast, the benefit of my current proposal is that it's intended strictly for those circumstances where the eggs are *supposed to be* interchangeable except for platform-specificity and performance, and you should be able to at least tell from the filename which kind you have. In the case where we allow other choices of features, you would need some kind of tool to tell you what features the egg was built with. Maybe another possibility is to have *subprojects* instead, where a subproject is something built using the same setup.py, but has a distinct project name, like "PyProtocols-CExtensions" or "Twisted-Foo". By default, perhaps such a multi-project setup script would run each subproject with its own build directory, and dump multiple eggs or source distributions into the dist/ directory. This might take some munging of EasyInstall to support picking up the distributions produced when running the bdist_egg, but it might be doable. The principal downsides to this approach are the doubling up of eggs involved, and the need to keep a precise match of versions between the packages. In particular, if someone installs a new version of a package without its C extensions, and the C extensions still exist for an older version, it will end up importing the wrong extensions -- and it will be hard to tell what happened and why. The package will just seem broken. Sigh. I guess at this point I don't really see a way to do optional extensions that doesn't turn into a crazy madhouse of support later. It seems to me that at least the problems with my approach would at most boil down to, "how come this thing is so slow"? :)
Phillip J. Eby wrote:
At 05:32 PM 2/1/2007 -0500, Jim Fulton wrote:
I'm still worried about the ambiguous case when there are both platform-dependent and platform-independent eggs installed.
How would this happen?
At least in a couple of ways. 1. As I mentioned in my previous note, when a package has optional extensions, one will often want to disable the extensions for debugging purposes. It is easier debugging Python code than C code, especially in combination with other Python code. In the past, this was typically done by removing .so (or .pyd) files. This can still be done with eggs, but I thik it will be attractive to do this by selecting diffeent eggs. 2. Consider the following scenario: Someone has a mac without a development environment installed. They install some eggs and get versions without extensions. Later, they install the development tools that came on the CD with their mac. How do they reinstall the eggs with extensions? If they install in multi-version mode, won't they have a mix of eggs with and without extensions?
I think you're trying to solve a broader problem than the one I'm trying to solve, which is that I'd like to make it possible for people who don't have working compilers (i.e. mostly Windows, with some Mac users and some people in virtual hosting environments) to install packages that contain C extensions.
I'm trying to avoid a problem I think you may create. As soon as there can be two eggs that satisfy the same requirements but with different semantics, I think there is a problem. I understand that in the use case you are thinking of, this would normally not happen, but it still can happen and I suggest will happen.
In that scenario, you're going to *always* want to use this option to suppress optional extensions, because there isn't any way for you to build them. But, you would presumably still want to know about packages that *require* their extensions to be built.
I think you were proposing an easy_install option. This helps when someone installs a distribution directly, but doesn't help when a distribution is installed as a dependency.
This would be an option to suppress compiling *all* optional C extensions, period.
So it would apply to dependencies as well. Yeah, that makes sense.
It also doesn't help with controlling selection of eggs after installation. And I think it doesn't make it easy to change one's mind. For example, one might install an egg with extensions and then install one without extensions to debug a problem using the Python debugger. Would the option let them do that?
The idea was that it would be a build-time option.
Will it be possible to reinstall eggs with the same versions but with different choices wrt optional extensions? I guess even if it isn't supported by easy_install, I could make it work with buildout.
Is it possible to control this as part of the requirement specification? Perhaps this could be some kind of standard extra?
I'd strongly prefer to be able to control this via the requirements mechanism. I'd like to be able to say that I want or don't want extensions as part of a requirement string.
Yeah, I see the benefit of that, certainly. The problem is that we're trying to solve different problems. I just want to make it *possible* to suppress building extensions during easy_install.
I want that too, certainly.
I'll give some more thought to what you're asking for.
Super.
I have an inkling of an idea, but the problems have to do with things like having to actually check the egg's contents to see if it meets requirements, and there are problems regarding the need to clean up the build/ directory if you change what features you build something with.
You see, setuptools has an undocumented 'feature' mechanism (which is still used by some PEAK projects) to control the inclusion of various packages, extensions, etc. The main reason this is undocumented is because it turns out that it's fragile to specify what features to use or not use on the command line alone, due to some distutils' commands just taking whatever's in the build/ directory as gospel.
Anyway, that feature mechanism could probably be tied in to the requirements system, as long as there was a way to wipe the build/ directory whenever the features changed between runs of setup.py, and there was a way to list the features in the .egg-info, and pkg_resources was changed to query a distribution's "features" info when validating a requirement that includes "extras".
I'm a little concerned that this will incur additional disk access under various circumstances, unless there is some way to statically distinguish between extras that denote "features" and ones that indicate additional requirements. Of course, matching a requirement against a distribution when the requirement doesn't list any extras, will not incur overhead.
I guess we could do something like this for 0.7. One thing that concerns me, however, is that it potentially *increases* the amount of conflicts and confusion possible regarding a single egg, unless there's a way to include the features in the filename. You can't tell just by looking at it, if it meets your needs.
Yup. (I think this is related to the 2-byte/4-byte unicode issue.) Are there so many of these potential features that we couldn't reflect them in the file name? In the specific case of the presence or absence of extensions, that is already part of the file name. Eggs with extensions will have the platform reflected in the file name. Eggs without won't, so it should be easy to tell them apart.
In contrast, the benefit of my current proposal is that it's intended strictly for those circumstances where the eggs are *supposed to be* interchangeable except for platform-specificity and performance, and you should be able to at least tell from the filename which kind you have.
As I mention above, you'll be able to easily distinguish platform-specific and platform-independent eggs apart based on their file names.
In the case where we allow other choices of features, you would need some kind of tool to tell you what features the egg was built with.
In the general case, yes, unless you reflected the features in the file name. In the specific case of extensions, I don't think this is a problem.
Maybe another possibility is to have *subprojects* instead, where a subproject is something built using the same setup.py, but has a distinct project name, like "PyProtocols-CExtensions" or "Twisted-Foo". By default, perhaps such a multi-project setup script would run each subproject with its own build directory, and dump multiple eggs or source distributions into the dist/ directory.
Yup. And this could be automated triggered easily using minimal meta-data in the setup file.
This might take some munging of EasyInstall to support picking up the distributions produced when running the bdist_egg, but it might be doable.
The principal downsides to this approach are the doubling up of eggs involved, and the need to keep a precise match of versions between the packages. In particular, if someone installs a new version of a package without its C extensions, and the C extensions still exist for an older version, it will end up importing the wrong extensions -- and it will be hard to tell what happened and why. The package will just seem broken.
I see your point. This arises from the way that easy_install incrementally installs distributions. This potentially wouldn't be a problem for buildout, but I wouldn't want to break easy_install (or workingenv).
Sigh. I guess at this point I don't really see a way to do optional extensions that doesn't turn into a crazy madhouse of support later. It seems to me that at least the problems with my approach would at most boil down to, "how come this thing is so slow"? :)
OK, so based on this discussion, I'm in favor of your original proposal as a start. I think there should be a way to cause building/installation of a platform-dependent egg even if there is a platform-independent egg with the same installed already, and the other way around, to deal with the use cases I described earlier. Even in multiple-version mode, this is not a problem, because the eggs will have different file names. I'd really *like* to be able to reflect the selection of these somehow in requirement specifications, but, if need be, this can be dealt with at the tool (e.g. buildout) level. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 08:11 AM 2/2/2007 -0500, Jim Fulton wrote:
Phillip J. Eby wrote:
At 05:32 PM 2/1/2007 -0500, Jim Fulton wrote:
I'm still worried about the ambiguous case when there are both platform-dependent and platform-independent eggs installed. How would this happen?
At least in a couple of ways.
1. As I mentioned in my previous note, when a package has optional extensions, one will often want to disable the extensions for debugging purposes. It is easier debugging Python code than C code, especially in combination with other Python code. In the past, this was typically done by removing .so (or .pyd) files. This can still be done with eggs, but I thik it will be attractive to do this by selecting diffeent eggs.
2. Consider the following scenario: Someone has a mac without a development environment installed. They install some eggs and get versions without extensions. Later, they install the development tools that came on the CD with their mac. How do they reinstall the eggs with extensions? If they install in multi-version mode, won't they have a mix of eggs with and without extensions?
Well, they can always "rm -rf *.egg" and reinstall. :) Otherwise, they'll get them by attrition when packages are upgraded to newer versions. In fact, using -U might be sufficient, although I think EasyInstall actually has some quirks with respect to determining whether -U will end up in a reinstall or not.
In the specific case of the presence or absence of extensions, that is already part of the file name. Eggs with extensions will have the platform reflected in the file name. Eggs without won't, so it should be easy to tell them apart.
Yep.
I see your point. This arises from the way that easy_install incrementally installs distributions. This potentially wouldn't be a problem for buildout, but I wouldn't want to break easy_install (or workingenv).
Yes, so these features would have to wait until 0.7, and a possible redesign of EasyInstall to be based on buildout (or something like it, anyway), instead of the other way around. ;)
Sigh. I guess at this point I don't really see a way to do optional extensions that doesn't turn into a crazy madhouse of support later. It seems to me that at least the problems with my approach would at most boil down to, "how come this thing is so slow"? :)
OK, so based on this discussion, I'm in favor of your original proposal as a start. I think there should be a way to cause building/installation of a platform-dependent egg even if there is a platform-independent egg with the same installed already, and the other way around, to deal with the use cases I described earlier. Even in multiple-version mode, this is not a problem, because the eggs will have different file names. I'd really *like* to be able to reflect the selection of these somehow in requirement specifications, but, if need be, this can be dealt with at the tool (e.g. buildout) level.
EasyInstall probably just needs to grow an option to force reinstallation of a package, as that's a generally useful feature. I.e., sort of a "don't allow the requirement to be satisfied with an egg that's already on sys.path" option.
Phillip J. Eby wrote:
At 08:11 AM 2/2/2007 -0500, Jim Fulton wrote:
Phillip J. Eby wrote:
At 05:32 PM 2/1/2007 -0500, Jim Fulton wrote:
I'm still worried about the ambiguous case when there are both platform-dependent and platform-independent eggs installed. How would this happen?
At least in a couple of ways.
1. As I mentioned in my previous note, when a package has optional extensions, one will often want to disable the extensions for debugging purposes. It is easier debugging Python code than C code, especially in combination with other Python code. In the past, this was typically done by removing .so (or .pyd) files. This can still be done with eggs, but I thik it will be attractive to do this by selecting diffeent eggs.
2. Consider the following scenario: Someone has a mac without a development environment installed. They install some eggs and get versions without extensions. Later, they install the development tools that came on the CD with their mac. How do they reinstall the eggs with extensions? If they install in multi-version mode, won't they have a mix of eggs with and without extensions?
Well, they can always "rm -rf *.egg" and reinstall. :) Otherwise, they'll get them by attrition when packages are upgraded to newer versions. In fact, using -U might be sufficient, although I think EasyInstall actually has some quirks with respect to determining whether -U will end up in a reinstall or not.
I'd be surprised if upgrade would reinstall if there weren't a later version available. ...
I see your point. This arises from the way that easy_install incrementally installs distributions. This potentially wouldn't be a problem for buildout, but I wouldn't want to break easy_install (or workingenv).
Yes, so these features would have to wait until 0.7, and a possible redesign of EasyInstall to be based on buildout (or something like it, anyway), instead of the other way around. ;)
I didn't mean to imply that buildout was better than easy_install, merely noting that they were different.
Sigh. I guess at this point I don't really see a way to do optional extensions that doesn't turn into a crazy madhouse of support later. It seems to me that at least the problems with my approach would at most boil down to, "how come this thing is so slow"? :)
OK, so based on this discussion, I'm in favor of your original proposal as a start. I think there should be a way to cause building/installation of a platform-dependent egg even if there is a platform-independent egg with the same installed already, and the other way around, to deal with the use cases I described earlier. Even in multiple-version mode, this is not a problem, because the eggs will have different file names. I'd really *like* to be able to reflect the selection of these somehow in requirement specifications, but, if need be, this can be dealt with at the tool (e.g. buildout) level.
EasyInstall probably just needs to grow an option to force reinstallation of a package, as that's a generally useful feature. I.e., sort of a "don't allow the requirement to be satisfied with an egg that's already on sys.path" option.
That seems like a rather big stick and a round-about way to do it. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org
At 01:26 PM 2/2/2007 -0500, Jim Fulton wrote:
Phillip J. Eby wrote:
Yes, so these features would have to wait until 0.7, and a possible redesign of EasyInstall to be based on buildout (or something like it, anyway), instead of the other way around. ;)
I didn't mean to imply that buildout was better than easy_install, merely noting that they were different.
Well, I did mean to imply it's better, because it is. Take the compliment and go quietly. ;) Seriously, I do intend for the "nest" tool to be "more like buildout", in the sense that it will target the management of individual nests (analagous to individual buildouts), that it will likely be more transactional, and better able to support plugins (analagous to recipes). So, I think that the basic ideas of buildout are good and should be emulated in "nest"; whether any actual code sharing or other similarity-in-detail will exist, I don't yet know.
EasyInstall probably just needs to grow an option to force reinstallation of a package, as that's a generally useful feature. I.e., sort of a "don't allow the requirement to be satisfied with an egg that's already on sys.path" option.
That seems like a rather big stick and a round-about way to do it.
True, but that statement applies to EasyInstall as a whole already, doesn't it? :) More seriously, it's the only thing I can reasonably see doing in the 0.6 timeframe unless somebody else can contribute good patches. I really want to put 0.6 to bed so that serious work on 0.7 can start -- something that's now almost a year overdue, compared to my druthers.
Phillip J. Eby wrote:
I'm thinking setuptools should allow you to specify that one or more C extensions in a project are optional.
At minimum, there would then be a command-line option you could give to easy_install to say, "don't build any optional extensions", so if you don't have a compiler or need a cross-platform egg, you can skip those extensions.
There would also need to be a way to specify this when using the regular build commands.
This feature would address projects that include optional C versions of certain code, that can fall back to pure Python implementations.
I don't have anything I maintain that falls under this, but I've had a hard time supporting dependencies which do have this problem (Cheetah in particular). It would be quite handy to have this. And actually my experience with Cheetah would make me reluctant to introduce a zope.interfaces requirement for all the same reasons -- even if I'm willing to get degraded performance, that's not an option I have (which in many cases is quite acceptable, as it would be for the way I use Cheetah). -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
participants (3)
-
Ian Bicking
-
Jim Fulton
-
Phillip J. Eby