Hello,
I'm looking for case studies or other examples of management of a development/test/build/QA/release process involving lots of Python packages with dependencies.
At work currently, we use Hudson for running our tests, and are using it to produce eggs, sdists, and PIP requirements files. It's shaping up to be a nice way to build our packages, but it's about to get a lot bigger with a lot more packages and complex dependencies. We're working on defining processes for our developers and testers, and we're inventing as we go.
I have a good idea where we need to go with this, but our management wants to hear how others in the community handle this kind of challenge with releasing multiple dependent packages together, especially in light of workflows made possible by DVCS. (We're using Git)
Does anyone have any stories to tell on this front?
Without knowing the size (small, medium or large size company), the platforms (win, linux, mac) and the packages (external or internal) is hard to make any sensible reasoning.
For what is is worth keep a central repository for dependencies (either external or internally produced) and enforce it with an iron fist or thinks will go pretty quickly out of control (project A depends on project B and project C on project B' but projectB and projectB' cannot be used at the same time).
You probably need a plan for enterprise deployment on the site or across sites so installer integration is a deal breaker (especially if many sites/countries are involved in the deployment stage so communication can be difficult).
In my experience I avoided anything that relies on setuptools or any magic/clever stuff that replaces a native installing system (rpm, msi, dpkg or pkg).
For legal reason and traceability reasons anything that attempt to download or "dinamically" do things is a no option.
I hope this helps, Regards, Antonio
On 6 May 2010, at 22:53, Brad Allen wrote:
Hello,
I'm looking for case studies or other examples of management of a development/test/build/QA/release process involving lots of Python packages with dependencies.
At work currently, we use Hudson for running our tests, and are using it to produce eggs, sdists, and PIP requirements files. It's shaping up to be a nice way to build our packages, but it's about to get a lot bigger with a lot more packages and complex dependencies. We're working on defining processes for our developers and testers, and we're inventing as we go.
I have a good idea where we need to go with this, but our management wants to hear how others in the community handle this kind of challenge with releasing multiple dependent packages together, especially in light of workflows made possible by DVCS. (We're using Git)
Does anyone have any stories to tell on this front? _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
On Thu, May 6, 2010 at 5:51 PM, Antonio Cavallo a.cavallo@cavallinux.eu wrote:
Without knowing the size (small, medium or large size company), the platforms (win, linux, mac) and the packages (external or internal) is hard to make any sensible reasoning.
Well, I am looking for stories here from other organizations as a source of lessons. However, maybe it would help if I spent more time describing what we're trying to accomplish.
We're a medium size software product company (over 100 employees), the supported platforms are Linux and Windows servers, and most of our package dependencies are internal. However, we also have third party open source dependencies.
For what is is worth keep a central repository for dependencies (either external or internally produced) and enforce it with an iron fist or thinks will go pretty quickly out of control (project A depends on project B and project C on project B' but projectB and projectB' cannot be used at the same time).
With a DVCS it makes sense to have multiple repositories (a repo for each package, er, I mean 'module distribution'), though we do have a centralized workflow with a central server containing all the repos.
The approach we're moving toward is to have the 'master' branch of each repo associated with the most current stable release, and a 'development' branch for the most current development work. For old releases of a given package under maintenance, there would be a development and a release branch for each major + minor version number. For example maintenance version 2.7 would have the branches dev_2.7 and rel_2.7.
Each development and release branch of each package would have a separate Hudson job running the tests, and building eggs, sdists, and pip requirements files. The script for building the eggs creates a version consisting of the version in setup.py + the tag_build from setup.cfg + the Hudson build number. For release branches, the build script takes the extra step of creating a Git tag of the version (including the Hudson build number), so each build can be linked back to a commit id in the repo.
In my experience I avoided anything that relies on setuptools or any magic/clever stuff that replaces a native installing system (rpm, msi, dpkg or pkg).
Well, we're pretty much relying on setuptools/buildout, and I don't see anything wrong with that, as long as we have a reasonable migration path distribute/distutils2 in the future. Up till now we've built the eggs manually and put them on our package server, and deliver buildouts to our customers using zc.sourcerelease. Now that we've automated our build process using Hudson the natural next step will be get buildout to make use of the pip requirements file created by the Hudson jobs.
The actual release to the customer usually involves a selection of 'top level' packages, and all of them need to rely on the same version of the core library dependencies. In the past, we hand-coded a version.cfg file for use by buildout, and that file was put under version control and tagged for each release. In the future, we're considering hand-coding a pip requirements file to contain the desired version of each package needed by a customer, and setting up a Hudson job to run the tests for that particular set of versions. That would define a 'KGS' (Known Good Set) of top level packages and their dependencies, and we would keep that KGS as a release configuration under version control.
For legal reason and traceability reasons anything that attempt to download or "dinamically" do things is a no option.
I have no clue what you mean by that. What do you have against 'downloading'?
ith a DVCS it makes sense to have multiple repositories (a repo for
each package, er, I mean 'module distribution'), though we do have a centralized workflow with a central server containing all the repos.
There's nothing specific on the source code control tool: I assume developers will rely anyway on a "frozen" module/package set (or release, or tag or KGS as you named later on).
development and a release branch for each major + minor version number. For example maintenance version 2.7 would have the branches dev_2.7 and rel_2.7.
For the version (the one exposed through __version__) is it worth to keep it as X.Y.Z or it may break the bdist_msi installer, especially if you're on a multiplatform: I wished they standardised ___version__ to be this way in the language so intra package dependency could be done in a reasonable way.
Well, we're pretty much relying on setuptools/buildout, and I don't see anything wrong with that, as long as we have a reasonable migration path distribute/distutils2
In the past (it might be still now) it tried to replace python distribution files (like site.py if I'm not wrong, it was long while ago): so at each python upgrade/patch released (like in security patches) you needed to reinstall it, breaking dependencies.
Beside the technical reasons, from a network administration point of view is a bad way to replace a native way to install things. Plenty of company have developers without administrator privileges on a machine.
All of a sudden if you are a syadmin and must know what is installed on a remote machine (possible in another continent) you could not use any longer the standard system tools (rpm -qi on linux) but you need to be logged as the developer and go figuring out the way he/she installed things.
Again I suggest to stay away from anything that relies on setuptools, that is my professional experience, and I haven't seen any (good) reason to suggest the contrary: you needs might be different.
The actual release to the customer usually involves a selection of 'top level' packages, and all of them need to rely on the same version of the core library dependencies.
That is what I meant (KGS) at the begin of the message, I think I haven't expressed myself clearly :( If you use hudson (or anything like that) you should have a plan to not hand fiddling with files anyway.
For legal reason and traceability reasons anything that attempt to download or "dinamically" do things is a no option.
I have no clue what you mean by that. What do you have against 'downloading'?
Traceability is I give you the "product" (in business lingo "deliverable") now the questions are: - What it depends upon? python + module foo version X + module bar version Y and so on. - How do I rebuild it if you company goes bankrupt (I hope not but it is an eventuality)? - Is there any hidden backdoor any of you employee has put in one of the many component? - How do we get the name of the offender if that happen?
Now you probably can see why I have a LOT against wild downloading. Beside this there are licensing problems like using GPL code on a commercial software although I must admit python is quite liberal on that point.
I hope this helps. Best regards, Antonio
On Fri, May 7, 2010 at 2:43 AM, Antonio Cavallo a.cavallo@cavallinux.euwrote:
Beside the technical reasons, from a network administration point of view is a bad way to replace a native way to install things. Plenty of company have developers without administrator privileges on a machine.
All of a sudden if you are a syadmin and must know what is installed on a remote machine (possible in another continent) you could not use any longer the standard system tools (rpm -qi on linux) but you need to be logged as the developer and go figuring out the way he/she installed things.
Again I suggest to stay away from anything that relies on setuptools, that is my professional experience, and I haven't seen any (good) reason to suggest the contrary: you needs might be different.
I agree that using native operating system tools to install packages is a good thing. There's one big problem with that though: it becomes near impossible to sandbox packages into their own virtualenvs. This is important to us because we have multiple packages that may rely upon different versions of different packages.
Traceability is I give you the "product" (in business lingo "deliverable") now the questions are:
- What it depends upon? python + module foo version X + module bar version
Y and so on.
- How do I rebuild it if you company goes bankrupt (I hope not but it is
an eventuality)?
- Is there any hidden backdoor any of you employee has put in one of the
many component?
- How do we get the name of the offender if that happen?
I suppose I can see the benefit of having dependencies in version control. I'm not sure I like the idea of ruling things with an iron fist though. I could see managing dependencies and managing the build process turning into a full time job. I suppose there may be benefits to that, but whether or not we can dedicate a person to it is a decision that is well above me. :-)
On Fri, May 7, 2010 at 5:42 AM, Brad Allen bradallen137@gmail.com wrote:
Well, I am looking for stories here from other organizations as a source of lessons. However, maybe it would help if I spent more time describing what we're trying to accomplish.
We're a medium size software product company (over 100 employees), the supported platforms are Linux and Windows servers, and most of our package dependencies are internal. However, we also have third party open source dependencies.
For what is is worth keep a central repository for dependencies (either external or internally produced) and enforce it with an iron fist or thinks will go pretty quickly out of control (project A depends on project B and project C on project B' but projectB and projectB' cannot be used at the same time).
With a DVCS it makes sense to have multiple repositories (a repo for each package, er, I mean 'module distribution'), though we do have a centralized workflow with a central server containing all the repos.
You may be interested to take a look how Chrome manages it's dependencies with gclient - "A script for managing a workspace with modular dependencies that are each checked out independently from different repositories". http://dev.chromium.org/developers/how-tos/depottools
A little bit tweaked it can be used to track newer commits in source repositories to give you a hint when a checkpoint in your config file can be moved forward. Starting with this you may further mark packages as autoupdating etc.
Le 06/05/2010 23:53, Brad Allen a écrit :
Hello,
I'm looking for case studies or other examples of management of a development/test/build/QA/release process involving lots of Python packages with dependencies.
Our context is a research teams working togeteher and integrating different packages both in Python and C, C++ and Fortran.
We use buildbot for the continuous integration. All the stages (build, installation, test, QA upload) are done using setuptools. Each package is - built (C, C++ and Fortran) using scons integrated with setuptools. - install, test, QA - egg are cretaed with the version (X.Y.Z) and svn revision. It depends on the exact revision number of each its dependencies for binary compatibility.
Finally the eggs are upload on a central repository. It is quite inflexible if you use Python packages, but it is needed when you use binary packages with cross dependencies.
cheers Christophe
At work currently, we use Hudson for running our tests, and are using it to produce eggs, sdists, and PIP requirements files. It's shaping up to be a nice way to build our packages, but it's about to get a lot bigger with a lot more packages and complex dependencies. We're working on defining processes for our developers and testers, and we're inventing as we go.
I have a good idea where we need to go with this, but our management wants to hear how others in the community handle this kind of challenge with releasing multiple dependent packages together, especially in light of workflows made possible by DVCS. (We're using Git)
Does anyone have any stories to tell on this front? _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig