Re: [Distutils] Deployment with setuptools: a basket-of-eggs approach
On 11 Apr 2006 10:40:35 +0200, Iwan Vosloo XXXXX wrote:
Hi Maris,
what do you mean by 'deploying' an egg to the SharedDirectory? Will you install it there, or merely put it there so that installs elsewhere can fetch the egg from the repository in SharedDirectory?
I also wondered what platforms are involved in your environment?
-i
Hi Iwan. According to the documentation for the pkg_resources API, eggs need to be placed on sys.path somehow for the 'require' function's automatic resource discovery to work. The easiest way to do this is to add the SharedDirectory to the PYTHONPATH. With respect to deployment, I believe that simply copying the .egg to the SharedDirectory should be enough. This is similar to the instructions in the easy_install documentation under 'Installing on Un-networked Machines'. Thus, developers would execute the following: % easy_install -f /opt/eggs -zmaxd /opt/eggs mypackage This will have the effect of building and copying all eggs to the SharedDirectory, as well as copying all third-party eggs and dependancies. As an aside, this raises some interesting questions: should third-party eggs be treated as run-time data, and live with the application, or should they be treated the same as all of our internal eggs? What is this the most efficient way to upgrade versions? Also, the latter approach leaves some important questions about open-ended versioning. For example, what if I have a chain of dependancies, like so: MyPackage-1.0 -> eggA-1.0 -> eggB-1.0 What happens if I upgrade to eggB-2.0, which has an API that is incompatible with eggB-1.0? If I originally stated the dependancy as 'require(eggB)', I now have to go back and force eggA-1.0 to use the earlier version of eggB somehow. If it is a third-party package, I am in trouble, as I may not be able to force a specific version. I would then have to test all combinations of dependancies across all applications and versions... So, I just re-discovered a form of DLL Hell.... I could impliment a policy of freezing the dependancies to a specific Major and Minor version when an application or package states it's requirements. So a developer _must_ state the requirement as 'require(eggB==1.0)'. The onus then falls on the developer to expand the range of supported versions of dependancies as they are published (which is what a good unit test suite will do). Anyway, as far as platforms are concerned, we have a mix of Windows and Linux i386 systems, no 64bit systems. Directory sharing is implemented through CIFS (Samba). I should also mention that we are not working with modules that contain C extensions. I hope this answers your questions, Maris P.S. I CC'd this to distutils-sig, as I feel there is some information here that other people may find valuable.
Hi Maris, Ok, I see... You can thus assume in your environment that the network will always be there. I was wondering whether you've ever looked at something like Debian's apt. (Mentioned here just to learn from it, not to advocate its use.) Apt is a wonderful tool for keeping repositories and installing packages. It does not solve all problems - and has the drawback that it only allows one version of something on a system (but you can trick it by having different package names...). The hell you're talking about is something that Debian (and, I suppose other distros) has a lot of experience in managing. And, for Debian, apt is the tool. (I don't know the others.) Of course there are also a number of conventions and policies that play with to make it work. I find it odd that you call upon unit testing. Is the issue not actually integration testing? I think that the only way to deal with the possible complexities of many packages and dependencies is to impose restictions on when and how things are released. For example, all the packages in Debian release X are tested to work together well (this is integration testing). So, in Debian, you don't only have packages, you also have a set-of-versioned packages (also versioned) which is the release of the entire distro. Any new version of a package, or new package that should work with that distro would need to be tested with all the rest of the packages in that release of the distro. I suppose disallowing more than one version of a package on a machine (like they have done) is one way of simplifying things. And the standard workaround for special packages that need more versions, is to include part of the version in the name. For example "gcc-3.4" (version 3.4.2) can be installed alongside "gcc-4.0" (version 4.0.3). With your scheme, your respository of eggs is also like a single, shared installation of eggs. And it may be argued that there is a difference between putting something in a shared repository (which means "it is now officially released") versus installing a package on a machine where it is used. When you install it, you care about other localised things too that are not versioned, such as local config or user data. And things like apt include ways and means for you to upgrade user data and deal with config sensibly. It may be that the simplification of making "install" and "release" one thing is useful in an environment, I guess. But in some environments the simplification may introduce other hellish issues. (Sorry, I'm just thinking aloud, because I am also faced with such problems and, by habit, always build things around apt... So its interesting to see other thoughts.) (And, I've no clue as to what platforms apt is available on...) -i
On 11 Apr 2006 16:31:44 +0200, Iwan Vosloo XXXXX wrote:
Hi Maris,
Ok, I see...
You can thus assume in your environment that the network will always be there.
I was wondering whether you've ever looked at something like Debian's apt. (Mentioned here just to learn from it, not to advocate its use.) Apt is a wonderful tool for keeping repositories and installing packages. It does not solve all problems - and has the drawback that it only allows one version of something on a system (but you can trick it by having different package names...).
I am not completely familiar with the way that Debian handles package releases, but I do have experience with managing apt, ebuilds, rpms, and source installs. I do not believe that these systems are an option since some of our end-users are on Windows, and we do not have an in-house admistrator to handle application upgrades. Since end-users will be handling upgrades, and because of the widely varying technical skills of our users we must also have to try and have a trivial upgrade process (which easy_install can do, with a bit of help for runtime data and future-proofing). Since our servers are all Fedora Linux, we could try installing Cygwin on the Windows machines, and add facilities to automatically build two RPMS, one for Fedora, one for Cygwin, and install those. But it would be very hard to justify that extra complexity in a shop our size. One very nice thing about using setuptools and easy_install is that it keeps the application lifecycle within the Python world.
The hell you're talking about is something that Debian (and, I suppose other distros) has a lot of experience in managing. And, for Debian, apt is the tool. (I don't know the others.) Of course there are also a number of conventions and policies that play with to make it work.
I find it odd that you call upon unit testing. Is the issue not actually integration testing?
You are right, it would be integration testing. With a technical staff as small as ours (3 people) I was thinking that coding unit tests for the boundaries between your package and its dependancies would help ease the work load of upgrading. I was hoping to ease adoption of the new system by combining integration testing with a framework that people are already familiar with.
I think that the only way to deal with the possible complexities of many packages and dependencies is to impose restictions on when and how things are released. For example, all the packages in Debian release X are tested to work together well (this is integration testing). So, in Debian, you don't only have packages, you also have a set-of-versioned packages (also versioned) which is the release of the entire distro. Any new version of a package, or new package that should work with that distro would need to be tested with all the rest of the packages in that release of the distro.
That level of testing would require the roles of either Project Librarian or Build Master, neither of which we could justify at a company our size. Especially given the rate at which new projects are written.
I suppose disallowing more than one version of a package on a machine (like they have done) is one way of simplifying things. And the standard workaround for special packages that need more versions, is to include part of the version in the name. For example "gcc-3.4" (version 3.4.2) can be installed alongside "gcc-4.0" (version 4.0.3).
Alas, that is not an option with our legacy applications, which mix library versions onto a single system, with the additional requirement to be able to apply bug fixes to the systems easily (without a sysadmin).
With your scheme, your respository of eggs is also like a single, shared installation of eggs. And it may be argued that there is a difference between putting something in a shared repository (which means "it is now officially released") versus installing a package on a machine where it is used. When you install it, you care about other localised things too that are not versioned, such as local config or user data. And things like apt include ways and means for you to upgrade user data and deal with config sensibly.
It may be that the simplification of making "install" and "release" one thing is useful in an environment, I guess. But in some environments the simplification may introduce other hellish issues.
Those other hellish issues are influenced by the vague, situation-specific part I am attempting to grasp the full shape of. It seems to hinge on a handful of critical requirements, such as "two applications, one system, different versions of libraries", "fast/easy deployment to all systems", "fast/easy upgrades to all installed applications", or "mixed operating systems". One side effect of the 'require() only a single version' policy: why wouldn't I just bundle everything anyway? Since I want only a single version of the library, because that is all that I can guarantee to be stable, then why not enforce the restriction physically? Then we would have the situation you speak of, making the distinction between 'installing on the local machine', and 'deploying to the shared directory'. I would have to install most applications as a multi-install in order to meet the requirement of "two applications, one system, different versions of libraries". So we get: # deploy % easy_install -f /opt/eggs -zmaxd /opt/eggs mypackage # install % easy_install -f /opt/eggs -zmad /some/where mypackage We now have the Java-style install I outlined in my first email (this is the part where I wish that easy_install had an option to install *everything* in the target directory, instead of trying to use /usr/bin or whatever for scripts, etc. We can get around this with a custom setup.cfg and the "$base" distutils variable) This satisfies the additional requirments of "multiple operating systems" and "easy/fast installation" That leaves us with the requirement for "easy/fast upgrades". Under the bundled setup we still have to visit every server in order to push out a bugfix version of a library. What is more, we have to visit every individual application that was installed using the multi-install option. One way around this would be to specify a subset of a version number to which easy_install could be applied. Something like this: % easy_install --upgrade mypackage==X.X.Y We could achieve this by using pkg_resources to get the application's dependancy tree, as well as all installed versions of all dependancies. Then it is a simple matter of parsing the version string for each package and spliting out the 'X.X' part. Then, for each package in the tree, we re-run easy install with this information: os.system('easy_install -f /opt/eggs --upgrade mypackage==X.X') easy_install should then grab and install any bugfix versions as required. As far as pushing or pulling application bugfixes, this could be handled by a cron script or scheduled task. One per application, which means more work to install any given app, because setuptools can't handle this at the moment. But it would work!
(Sorry, I'm just thinking aloud, because I am also faced with such problems and, by habit, always build things around apt... So its interesting to see other thoughts.)
No problem there, thinking aloud is often why discussing ideas with people helps so much! Maris
Hi Maris, I'm not suggesting using apt - just thinking about the difference in models. I suppose the pure python equivalent of apt and its repositories would be to have your own private PyPi (the respository), and using easy_install to install things from there onto individual machines. This does not necessitate more work: installing and updating packages on individual machines _from_ this central repository can also be automated. For example by running a script in cron every night or so. This gives you a set of _released_ packages (those in the repository), and different machines where instances of packages are _installed_. I suppose there is a correlation between what Philip calls an environment in this sense, and a machine in the debian/apt world. An environment is just a more abstract way of looking at it allowing more flexibility. Also, I tend to package everything as an egg - applications also. Eggs are just a way to package software to me, not libraries? -i
Iwan Vosloo <iv@lantic.net> writes:
I suppose there is a correlation between what Philip calls an environment in this sense, and a machine in the debian/apt world. An environment is just a more abstract way of looking at it allowing more flexibility.
Sorrry, that was meant to be Ian, not Philip... -i
participants (2)
-
Iwan Vosloo
-
Mars