Re: [Distutils] Deployment with setuptools: a basket-of-eggs approach

12 Apr 2006

      Hi there; a bunch of ideas here, more than I can digest all at once I 
suppose.

Mars wrote:
...
Hello all,
I was reading through my backlog of Daily Python URL's and saw that
the topics of deployment and configuration managment seem to be
getting some attention lately.  We have been having issues similar to
this at my company, and I was hoping for some feedback on a solution I
devised for these problems.
In our situation:
 - we have only a handful of developers, maintaining a very large
legacy code base.
 - different applications use different versions of the same library
on the same machine
 - developers still want to push bug fixes out without visiting every
application
 - developers want to write new (non-compatible) versions of their libraries
 - developers want other developers to use their new libraries instead
of the old ones
The plan I have borrows a bit from the Java and Python development
communities, and relies heavily on setuptools, policy, and some
developer dicipline (we are professionals, after all).
*Outline*
- Developers outline Families of modules for common enterprise-wide tasks
- Developers build Packages (python .egg files) from Families
- Developers build Applications which are composed of Packages
*Packages*
- Packages are deployed to a SharedDirectory
- Package versions have three number parts, Major.Minor.Bugfix
- New package features, or *any* change in the package functionality,
changes _at least_ the Minor version number.
Given some other things you mention here, I think metadata updates will 
be common, which I assume would be grouped in with bugfixes in the version.
...
- Bugfixes should not change any functionality, and thus they only
update the Bugfix version number
- Developers write ChangeLogs for Packages before deployment
*SharedDirectory*
- Servers mount the packages SharedDirectory
*ChangeLogs*
- ChangeLogs outline any new features or Bugfixes in a Package
- ChangeLogs are sent to all other developers after deployment
*Applications*
- Applications find their dependancies in the SharedDirectory
- Applications have startup scripts that 'require' the
developer-specified packages (for example, setuptools entry_points and
automatic script generation do this)
- Applications 'require' only the first two numbers of the package
version, Major.Minor
Yes, requiring a fixed version doesn't work in my experience.  Requiring 
a range should, i.e., >=1.5,<1.6.  After testing you may change that 
requirement, and so you'll have to release.  E.g., you add a new 
feature, you test it works with an old version of another library, and 
you change that requirement to >=1.5,<1.7 and release that other library 
as a bugfix release.  This can work for applications as well as libraries.

I was talking to someone about .NET and how it handles versioning, and 
he mentioned that you can configure overrides.  So if you have a package 
A that works for B>=1.5,<1.6, and you confirm that at least in a 
specific context B==1.7.3 works, then you can add something to that 
*context* (not to either package) that overrides the requirement with 
B>=1.5,<1.6,>=1.7.3.  This would decrease the number of 
metadata-updating releases you would have to do.

I do everything as a "library", in that they are all setuptools 
packages.  This works well for me.  However, there's some 
as-yet-undefined entity above that.  An installation, or an environment. 
  For me that thing is a "website", though even that isn't strictly true 
-- sometimes a website has more than one environment (e.g., if there's a 
conflict I don't care to resolve), or more than one website has the same 
environment (if I want to run more than one website in the same 
process).  So no concrete term really works.  But the idea of an 
environment seems like an important abstraction to work in.  I think it 
is similar to what you are thinking of as an application.

A feature I'd really like is the ability to do an svn commit in concert 
that also updated some metadata.  So, lets say I have package A and B, 
and A requires B, and I'm updating A but have to make a change in B to 
go with it.  So I might say:

   fancysvn commit -m "changes" --update-requirements=A A/ B/

Then it would commit the changes in B, get the new version (which in 
development would be based on the svn revision), update A's 
entry_points.txt (in this model setup.py wouldn't hold that data), and 
then commit A.  I end up doing this a lot, but without this command I 
seldom update the requirements like I should.
...
Some observations for deployers:
 - All applications should automatically upgrade if there is a bugfix version.
For this you need a central database of application installations. 
Otherwise it seems like a simple script to do the update.
...
- Changes in functionality are isolated from legacy applications.
 - Because applications are self-contained in setuptools .egg files,
we can specify the shared directory as a global installation source
when using easy_install.
I just use an internal web directory for all the packages, though local 
access would be faster.  I've come to find actual .tar.gz or .egg 
creation a bit tedious, and have been considering using svn checkouts 
for everything.  Then the central index would be a series of links to 
svn repository locations.  That would require a link for every tag, 
e.g., svn://repos/Package/tags/0.3#egg=Package-0.3

Potentially such a page could be automatically generated from an svn 
hook, since the links are obvious if you stick to a conventional svn layout.
...
- Applications can be installed locally in a fashion similar to Java
applications, which are often a collection of .jar files and data
thrown into a common directory.  We can do the same with "easy_install
-f /opt/eggs -zmad /opt/myapp".
 - The application script itself could be shared from the central
drive, if we write a custom script that does not use the shebang line.
 This has the side-effect of automatically upgrading *everyone's*
version, which can be a blessing or a curse when you take versioning
of run-time data into account.
Yeah, that scares me a bit.  I'd rather keep track automatically, then 
upgrade individually.  Also easier to roll back just one piece if necessary.

If you stick with the idea of an isolated environment -- instead of a 
big pool -- it also means you don't have to lean to heavily on the 
versioning features of setuptools.  The advantage there being that the 
system is more translatable to other environments and languages.
...
Some observations for developers:
 - The version numbering policy is flexible, but one *must* end up
with a deployment policy where business-critical applications do not
have even the most minor of functionality changes foisted upon them
(for one solution see Java-style installations, outlined above).
I think the versioning will be pretty important.  For instance, the 
business needs of versioning have to be kept separate -- you'll have 
fairly formal requirements for how versions are assigned.
...
- Developers know what is happening to a package at a grain higher
than that presented by Subversion, thanks to a published ChangeLog.
Do you have any particular thoughts on how the ChangeLog is 
generated/presented?  I'm not very good at keeping logs at different 
levels of granularity (right now I typically only keep per-commit logs 
of work).
...
- Developers are free to chose any version of the available packages,
because they know that they are all available at install-time.  They
can upgrade and downgrade versions at will.
Though it improves over time, as some packages reach maturity, in my 
experience work typically involves poking at more than one package.  So 
during development I want everything to be editable.  This strongly 
favors working against the trunk (or at least a branch), since editing 
an older version of a package means you have to be quite careful about 
actually deploying.

Some of the mini-branch techniques people talk about might help here. 
Mostly people talk about this in the context of distributed systems, but 
I believe the Divmod guys are doing mini branches in subversion.
...
- Developers are still responsible for upgrading legacy applications
with more functional library versions, but that is what unit tests are
for (you /do/ write unit tests, don't you?)
Buildbot would probably be a great boon here as well.  While you don't 
want to accidentally "upgrade" a working deployed app, having tests run 
on all combinations of the libraries and apps will give you some idea of 
how much breakage has occurred, and when.
...
- It helps to use a setuptools alias to easy_install when building
packages for deployment.  You can then type "python setup.py deploy"
in the source directory as a quick and easy shortcut for building the
required .egg.
I've certainly found minor conveniences like these to be strong 
incentives to actually doing the right thing each time.
...
There are a few concerns that cross into release handling, and code
library maintenance and care:
 - Regular library reviews and a comfortable package end-of-life
schedule should be used to help prune the "supported packages" tree.
 - A relaxed package release schedule, with more features per-release,
should help slow things down to a comfortable balance between time
spent on upgrades, bugfixes, and features.
 - If the version numbers are climbing too fast, ask whether there
should be more features per-release, or if the package should be put
into 'alpha' or 'beta' status until it stabilizes.
What is the problem you are solving here?  Version numbers are just 
numbers, after all.  I guess one advantage of dated versions, or 
integer-only versions, is that you avoid some emotional resistance to 
increasing versions since the versions are relatively meaningless.
...
- Ask if anybody else is using the package you are maintaining.  If
not, ask yourself if it has be pre-maturely extract from an
application (remember 'You Aint Gonna Need It').
 - Ask if a quickly-rising package version indicates a family of
modules and functionality that should be re-arranged.  This could
happen if someone is developing a new application and dumping too much
application-specific functionality into an existing family.  Can the
family be split along functional lines?  Should the functionality be
kept in the application? (YAGNI again).
That's definitely been an issue I've seen.  There's definitely some 
internal packages which need splitting, for instance, since large 
portions have been highly stable, and updates elsewhere in the package 
shouldn't effect the release schedule of that stable core.  Also, if I 
want to know what a change will effect, I'll know that better if the 
granularity of that change is clearer.
...
Well, that's about it.  You almost need to resurrect the role of
'Project Librarian' to keep track of it all! ;)
I think more tool and reporting support is definitely called for.  In 
many cases instead of relying too heavily on setuptools loading the 
right package, I think it would be better to think in terms of an 
evironment builder with static analysis of the result.  E.g., you have a 
deployment that includes apps X, Y, and Z, then you start adding version 
requirements there, see what that brings in, report on any conflicts, 
and have the ability to do that speculatively.  That is, where 
setuptools primarily checks this things at runtime now, based on a 
specific installation, this would be a little more look-before-you-leap. 
  All the functions are in pkg_resources already, but there's no 
frontend to really look at this stuff.
...
I would love to hear people's feedback on this idea, as I am sure that
I am not the first to tread this path.
I'm definitely interested in this stuff too; but at least in the context 
of Python there's not a whole lot of experience in these things.  Or at 
best, people have set up their own ad hoc systems.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org