Hi there; a bunch of ideas here, more than I can digest all at once I suppose. Mars wrote:
Hello all,
I was reading through my backlog of Daily Python URL's and saw that the topics of deployment and configuration managment seem to be getting some attention lately. We have been having issues similar to this at my company, and I was hoping for some feedback on a solution I devised for these problems.
In our situation: - we have only a handful of developers, maintaining a very large legacy code base. - different applications use different versions of the same library on the same machine - developers still want to push bug fixes out without visiting every application - developers want to write new (non-compatible) versions of their libraries - developers want other developers to use their new libraries instead of the old ones
The plan I have borrows a bit from the Java and Python development communities, and relies heavily on setuptools, policy, and some developer dicipline (we are professionals, after all).
*Outline* - Developers outline Families of modules for common enterprise-wide tasks - Developers build Packages (python .egg files) from Families - Developers build Applications which are composed of Packages
*Packages* - Packages are deployed to a SharedDirectory - Package versions have three number parts, Major.Minor.Bugfix - New package features, or *any* change in the package functionality, changes _at least_ the Minor version number.
Given some other things you mention here, I think metadata updates will be common, which I assume would be grouped in with bugfixes in the version.
- Bugfixes should not change any functionality, and thus they only update the Bugfix version number - Developers write ChangeLogs for Packages before deployment
*SharedDirectory* - Servers mount the packages SharedDirectory
*ChangeLogs* - ChangeLogs outline any new features or Bugfixes in a Package - ChangeLogs are sent to all other developers after deployment
*Applications* - Applications find their dependancies in the SharedDirectory - Applications have startup scripts that 'require' the developer-specified packages (for example, setuptools entry_points and automatic script generation do this) - Applications 'require' only the first two numbers of the package version, Major.Minor
Yes, requiring a fixed version doesn't work in my experience. Requiring a range should, i.e., >=1.5,<1.6. After testing you may change that requirement, and so you'll have to release. E.g., you add a new feature, you test it works with an old version of another library, and you change that requirement to >=1.5,<1.7 and release that other library as a bugfix release. This can work for applications as well as libraries. I was talking to someone about .NET and how it handles versioning, and he mentioned that you can configure overrides. So if you have a package A that works for B>=1.5,<1.6, and you confirm that at least in a specific context B==1.7.3 works, then you can add something to that *context* (not to either package) that overrides the requirement with B>=1.5,<1.6,>=1.7.3. This would decrease the number of metadata-updating releases you would have to do. I do everything as a "library", in that they are all setuptools packages. This works well for me. However, there's some as-yet-undefined entity above that. An installation, or an environment. For me that thing is a "website", though even that isn't strictly true -- sometimes a website has more than one environment (e.g., if there's a conflict I don't care to resolve), or more than one website has the same environment (if I want to run more than one website in the same process). So no concrete term really works. But the idea of an environment seems like an important abstraction to work in. I think it is similar to what you are thinking of as an application. A feature I'd really like is the ability to do an svn commit in concert that also updated some metadata. So, lets say I have package A and B, and A requires B, and I'm updating A but have to make a change in B to go with it. So I might say: fancysvn commit -m "changes" --update-requirements=A A/ B/ Then it would commit the changes in B, get the new version (which in development would be based on the svn revision), update A's entry_points.txt (in this model setup.py wouldn't hold that data), and then commit A. I end up doing this a lot, but without this command I seldom update the requirements like I should.
Some observations for deployers: - All applications should automatically upgrade if there is a bugfix version.
For this you need a central database of application installations. Otherwise it seems like a simple script to do the update.
- Changes in functionality are isolated from legacy applications. - Because applications are self-contained in setuptools .egg files, we can specify the shared directory as a global installation source when using easy_install.
I just use an internal web directory for all the packages, though local access would be faster. I've come to find actual .tar.gz or .egg creation a bit tedious, and have been considering using svn checkouts for everything. Then the central index would be a series of links to svn repository locations. That would require a link for every tag, e.g., svn://repos/Package/tags/0.3#egg=Package-0.3 Potentially such a page could be automatically generated from an svn hook, since the links are obvious if you stick to a conventional svn layout.
- Applications can be installed locally in a fashion similar to Java applications, which are often a collection of .jar files and data thrown into a common directory. We can do the same with "easy_install -f /opt/eggs -zmad /opt/myapp". - The application script itself could be shared from the central drive, if we write a custom script that does not use the shebang line. This has the side-effect of automatically upgrading *everyone's* version, which can be a blessing or a curse when you take versioning of run-time data into account.
Yeah, that scares me a bit. I'd rather keep track automatically, then upgrade individually. Also easier to roll back just one piece if necessary. If you stick with the idea of an isolated environment -- instead of a big pool -- it also means you don't have to lean to heavily on the versioning features of setuptools. The advantage there being that the system is more translatable to other environments and languages.
Some observations for developers: - The version numbering policy is flexible, but one *must* end up with a deployment policy where business-critical applications do not have even the most minor of functionality changes foisted upon them (for one solution see Java-style installations, outlined above).
I think the versioning will be pretty important. For instance, the business needs of versioning have to be kept separate -- you'll have fairly formal requirements for how versions are assigned.
- Developers know what is happening to a package at a grain higher than that presented by Subversion, thanks to a published ChangeLog.
Do you have any particular thoughts on how the ChangeLog is generated/presented? I'm not very good at keeping logs at different levels of granularity (right now I typically only keep per-commit logs of work).
- Developers are free to chose any version of the available packages, because they know that they are all available at install-time. They can upgrade and downgrade versions at will.
Though it improves over time, as some packages reach maturity, in my experience work typically involves poking at more than one package. So during development I want everything to be editable. This strongly favors working against the trunk (or at least a branch), since editing an older version of a package means you have to be quite careful about actually deploying. Some of the mini-branch techniques people talk about might help here. Mostly people talk about this in the context of distributed systems, but I believe the Divmod guys are doing mini branches in subversion.
- Developers are still responsible for upgrading legacy applications with more functional library versions, but that is what unit tests are for (you /do/ write unit tests, don't you?)
Buildbot would probably be a great boon here as well. While you don't want to accidentally "upgrade" a working deployed app, having tests run on all combinations of the libraries and apps will give you some idea of how much breakage has occurred, and when.
- It helps to use a setuptools alias to easy_install when building packages for deployment. You can then type "python setup.py deploy" in the source directory as a quick and easy shortcut for building the required .egg.
I've certainly found minor conveniences like these to be strong incentives to actually doing the right thing each time.
There are a few concerns that cross into release handling, and code library maintenance and care: - Regular library reviews and a comfortable package end-of-life schedule should be used to help prune the "supported packages" tree. - A relaxed package release schedule, with more features per-release, should help slow things down to a comfortable balance between time spent on upgrades, bugfixes, and features. - If the version numbers are climbing too fast, ask whether there should be more features per-release, or if the package should be put into 'alpha' or 'beta' status until it stabilizes.
What is the problem you are solving here? Version numbers are just numbers, after all. I guess one advantage of dated versions, or integer-only versions, is that you avoid some emotional resistance to increasing versions since the versions are relatively meaningless.
- Ask if anybody else is using the package you are maintaining. If not, ask yourself if it has be pre-maturely extract from an application (remember 'You Aint Gonna Need It'). - Ask if a quickly-rising package version indicates a family of modules and functionality that should be re-arranged. This could happen if someone is developing a new application and dumping too much application-specific functionality into an existing family. Can the family be split along functional lines? Should the functionality be kept in the application? (YAGNI again).
That's definitely been an issue I've seen. There's definitely some internal packages which need splitting, for instance, since large portions have been highly stable, and updates elsewhere in the package shouldn't effect the release schedule of that stable core. Also, if I want to know what a change will effect, I'll know that better if the granularity of that change is clearer.
Well, that's about it. You almost need to resurrect the role of 'Project Librarian' to keep track of it all! ;)
I think more tool and reporting support is definitely called for. In many cases instead of relying too heavily on setuptools loading the right package, I think it would be better to think in terms of an evironment builder with static analysis of the result. E.g., you have a deployment that includes apps X, Y, and Z, then you start adding version requirements there, see what that brings in, report on any conflicts, and have the ability to do that speculatively. That is, where setuptools primarily checks this things at runtime now, based on a specific installation, this would be a little more look-before-you-leap. All the functions are in pkg_resources already, but there's no frontend to really look at this stuff.
I would love to hear people's feedback on this idea, as I am sure that I am not the first to tread this path.
I'm definitely interested in this stuff too; but at least in the context of Python there's not a whole lot of experience in these things. Or at best, people have set up their own ad hoc systems. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org