[Distutils] Deployment with setuptools: a basket-of-eggs approach
Ian Bicking
ianb at colorstudy.com
Wed Apr 12 19:04:37 CEST 2006
Hi there; a bunch of ideas here, more than I can digest all at once I
suppose.
Mars wrote:
> Hello all,
>
> I was reading through my backlog of Daily Python URL's and saw that
> the topics of deployment and configuration managment seem to be
> getting some attention lately. We have been having issues similar to
> this at my company, and I was hoping for some feedback on a solution I
> devised for these problems.
>
> In our situation:
> - we have only a handful of developers, maintaining a very large
> legacy code base.
> - different applications use different versions of the same library
> on the same machine
> - developers still want to push bug fixes out without visiting every
> application
> - developers want to write new (non-compatible) versions of their libraries
> - developers want other developers to use their new libraries instead
> of the old ones
>
> The plan I have borrows a bit from the Java and Python development
> communities, and relies heavily on setuptools, policy, and some
> developer dicipline (we are professionals, after all).
>
> *Outline*
> - Developers outline Families of modules for common enterprise-wide tasks
> - Developers build Packages (python .egg files) from Families
> - Developers build Applications which are composed of Packages
>
> *Packages*
> - Packages are deployed to a SharedDirectory
> - Package versions have three number parts, Major.Minor.Bugfix
> - New package features, or *any* change in the package functionality,
> changes _at least_ the Minor version number.
Given some other things you mention here, I think metadata updates will
be common, which I assume would be grouped in with bugfixes in the version.
> - Bugfixes should not change any functionality, and thus they only
> update the Bugfix version number
> - Developers write ChangeLogs for Packages before deployment
>
> *SharedDirectory*
> - Servers mount the packages SharedDirectory
>
> *ChangeLogs*
> - ChangeLogs outline any new features or Bugfixes in a Package
> - ChangeLogs are sent to all other developers after deployment
>
> *Applications*
> - Applications find their dependancies in the SharedDirectory
> - Applications have startup scripts that 'require' the
> developer-specified packages (for example, setuptools entry_points and
> automatic script generation do this)
> - Applications 'require' only the first two numbers of the package
> version, Major.Minor
Yes, requiring a fixed version doesn't work in my experience. Requiring
a range should, i.e., >=1.5,<1.6. After testing you may change that
requirement, and so you'll have to release. E.g., you add a new
feature, you test it works with an old version of another library, and
you change that requirement to >=1.5,<1.7 and release that other library
as a bugfix release. This can work for applications as well as libraries.
I was talking to someone about .NET and how it handles versioning, and
he mentioned that you can configure overrides. So if you have a package
A that works for B>=1.5,<1.6, and you confirm that at least in a
specific context B==1.7.3 works, then you can add something to that
*context* (not to either package) that overrides the requirement with
B>=1.5,<1.6,>=1.7.3. This would decrease the number of
metadata-updating releases you would have to do.
I do everything as a "library", in that they are all setuptools
packages. This works well for me. However, there's some
as-yet-undefined entity above that. An installation, or an environment.
For me that thing is a "website", though even that isn't strictly true
-- sometimes a website has more than one environment (e.g., if there's a
conflict I don't care to resolve), or more than one website has the same
environment (if I want to run more than one website in the same
process). So no concrete term really works. But the idea of an
environment seems like an important abstraction to work in. I think it
is similar to what you are thinking of as an application.
A feature I'd really like is the ability to do an svn commit in concert
that also updated some metadata. So, lets say I have package A and B,
and A requires B, and I'm updating A but have to make a change in B to
go with it. So I might say:
fancysvn commit -m "changes" --update-requirements=A A/ B/
Then it would commit the changes in B, get the new version (which in
development would be based on the svn revision), update A's
entry_points.txt (in this model setup.py wouldn't hold that data), and
then commit A. I end up doing this a lot, but without this command I
seldom update the requirements like I should.
> Some observations for deployers:
> - All applications should automatically upgrade if there is a bugfix version.
For this you need a central database of application installations.
Otherwise it seems like a simple script to do the update.
> - Changes in functionality are isolated from legacy applications.
> - Because applications are self-contained in setuptools .egg files,
> we can specify the shared directory as a global installation source
> when using easy_install.
I just use an internal web directory for all the packages, though local
access would be faster. I've come to find actual .tar.gz or .egg
creation a bit tedious, and have been considering using svn checkouts
for everything. Then the central index would be a series of links to
svn repository locations. That would require a link for every tag,
e.g., svn://repos/Package/tags/0.3#egg=Package-0.3
Potentially such a page could be automatically generated from an svn
hook, since the links are obvious if you stick to a conventional svn layout.
> - Applications can be installed locally in a fashion similar to Java
> applications, which are often a collection of .jar files and data
> thrown into a common directory. We can do the same with "easy_install
> -f /opt/eggs -zmad /opt/myapp".
> - The application script itself could be shared from the central
> drive, if we write a custom script that does not use the shebang line.
> This has the side-effect of automatically upgrading *everyone's*
> version, which can be a blessing or a curse when you take versioning
> of run-time data into account.
Yeah, that scares me a bit. I'd rather keep track automatically, then
upgrade individually. Also easier to roll back just one piece if necessary.
If you stick with the idea of an isolated environment -- instead of a
big pool -- it also means you don't have to lean to heavily on the
versioning features of setuptools. The advantage there being that the
system is more translatable to other environments and languages.
> Some observations for developers:
> - The version numbering policy is flexible, but one *must* end up
> with a deployment policy where business-critical applications do not
> have even the most minor of functionality changes foisted upon them
> (for one solution see Java-style installations, outlined above).
I think the versioning will be pretty important. For instance, the
business needs of versioning have to be kept separate -- you'll have
fairly formal requirements for how versions are assigned.
> - Developers know what is happening to a package at a grain higher
> than that presented by Subversion, thanks to a published ChangeLog.
Do you have any particular thoughts on how the ChangeLog is
generated/presented? I'm not very good at keeping logs at different
levels of granularity (right now I typically only keep per-commit logs
of work).
> - Developers are free to chose any version of the available packages,
> because they know that they are all available at install-time. They
> can upgrade and downgrade versions at will.
Though it improves over time, as some packages reach maturity, in my
experience work typically involves poking at more than one package. So
during development I want everything to be editable. This strongly
favors working against the trunk (or at least a branch), since editing
an older version of a package means you have to be quite careful about
actually deploying.
Some of the mini-branch techniques people talk about might help here.
Mostly people talk about this in the context of distributed systems, but
I believe the Divmod guys are doing mini branches in subversion.
> - Developers are still responsible for upgrading legacy applications
> with more functional library versions, but that is what unit tests are
> for (you /do/ write unit tests, don't you?)
Buildbot would probably be a great boon here as well. While you don't
want to accidentally "upgrade" a working deployed app, having tests run
on all combinations of the libraries and apps will give you some idea of
how much breakage has occurred, and when.
> - It helps to use a setuptools alias to easy_install when building
> packages for deployment. You can then type "python setup.py deploy"
> in the source directory as a quick and easy shortcut for building the
> required .egg.
I've certainly found minor conveniences like these to be strong
incentives to actually doing the right thing each time.
> There are a few concerns that cross into release handling, and code
> library maintenance and care:
> - Regular library reviews and a comfortable package end-of-life
> schedule should be used to help prune the "supported packages" tree.
> - A relaxed package release schedule, with more features per-release,
> should help slow things down to a comfortable balance between time
> spent on upgrades, bugfixes, and features.
> - If the version numbers are climbing too fast, ask whether there
> should be more features per-release, or if the package should be put
> into 'alpha' or 'beta' status until it stabilizes.
What is the problem you are solving here? Version numbers are just
numbers, after all. I guess one advantage of dated versions, or
integer-only versions, is that you avoid some emotional resistance to
increasing versions since the versions are relatively meaningless.
> - Ask if anybody else is using the package you are maintaining. If
> not, ask yourself if it has be pre-maturely extract from an
> application (remember 'You Aint Gonna Need It').
> - Ask if a quickly-rising package version indicates a family of
> modules and functionality that should be re-arranged. This could
> happen if someone is developing a new application and dumping too much
> application-specific functionality into an existing family. Can the
> family be split along functional lines? Should the functionality be
> kept in the application? (YAGNI again).
That's definitely been an issue I've seen. There's definitely some
internal packages which need splitting, for instance, since large
portions have been highly stable, and updates elsewhere in the package
shouldn't effect the release schedule of that stable core. Also, if I
want to know what a change will effect, I'll know that better if the
granularity of that change is clearer.
> Well, that's about it. You almost need to resurrect the role of
> 'Project Librarian' to keep track of it all! ;)
I think more tool and reporting support is definitely called for. In
many cases instead of relying too heavily on setuptools loading the
right package, I think it would be better to think in terms of an
evironment builder with static analysis of the result. E.g., you have a
deployment that includes apps X, Y, and Z, then you start adding version
requirements there, see what that brings in, report on any conflicts,
and have the ability to do that speculatively. That is, where
setuptools primarily checks this things at runtime now, based on a
specific installation, this would be a little more look-before-you-leap.
All the functions are in pkg_resources already, but there's no
frontend to really look at this stuff.
> I would love to hear people's feedback on this idea, as I am sure that
> I am not the first to tread this path.
I'm definitely interested in this stuff too; but at least in the context
of Python there's not a whole lot of experience in these things. Or at
best, people have set up their own ad hoc systems.
--
Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org
More information about the Distutils-SIG
mailing list