RFC: Egg cache fro self-contained buildout

Problem ======= For (stage and production) deployment purposed, we, ZC, use RPMs. It's considered good hygene to produce source RPMs as well as binary RPMS. This led me to create zc.sourcerelease, which automates creation of self-contained source tar balls, that, among other benefits, provide input for making source RPMs, which feed into a process for creating binary RPMs. We're moving toward a continuous deployment pipeline, where binaries are produced early in the development cycle and tested in a controlled environment that matches production. It no-longer (never did actually) makes sense to produce source RPMs that could be deployed in alternate (untested) environments. In general, our existing build process is grotesquely slow: - we run a buildout to produce a source release. - we run it again to build a source (and binary) rpm from the source release. Both of these run in such a way that all of the eggs have to be rebuilt. (But sources don't have to be downloaded.) We'd like to move toward a model where we construct a build environment for each controlled deployment environment. In this build environment, we never want to build a given distribution more than once. We need to produce application binaries that are self contained. Buildout allows you to use a shared eggs directory. This can greatly speed buildouts, because already-built distributions can be found and used locally. However, buildouts that use shared eggs directories aren't self contained. They depend on the shared eggs directory. I'd like to be able to reuse previously-built eggs, but have eggs installed in my local buildout, so it's self contained. Proposal: egg-cache =================== If egg-cache is set to a directory, then when buildout builds an egg, it will copy it to the egg cache. When looking for distributions, it will look in the egg cache and, if it finds a matching egg there, it will copy the egg to the buildout eggs directory. The end result will be that an egg cache will have the same economy as the current shared eggs directory, as far as building is concerned, but it won't have the disk-space saving of a shared eggs directiory. It will lead to buildouts that are self contained (at least wrt eggs) and that can be copied to a deployment environment directly. Thoughts? Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton

Sounds like a nice feature. I like having copies of everything (and nothing extra) for each deployment. On Mon, May 13, 2013 at 3:08 PM, Jim Fulton <jim@zope.com> wrote:
Problem =======
For (stage and production) deployment purposed, we, ZC, use RPMs. It's considered good hygene to produce source RPMs as well as binary RPMS. This led me to create zc.sourcerelease, which automates creation of self-contained source tar balls, that, among other benefits, provide input for making source RPMs, which feed into a process for creating binary RPMs.
We're moving toward a continuous deployment pipeline, where binaries are produced early in the development cycle and tested in a controlled environment that matches production. It no-longer (never did actually) makes sense to produce source RPMs that could be deployed in alternate (untested) environments.
In general, our existing build process is grotesquely slow:
- we run a buildout to produce a source release.
- we run it again to build a source (and binary) rpm from the source release.
Both of these run in such a way that all of the eggs have to be rebuilt. (But sources don't have to be downloaded.)
We'd like to move toward a model where we construct a build environment for each controlled deployment environment. In this build environment, we never want to build a given distribution more than once. We need to produce application binaries that are self contained.
Buildout allows you to use a shared eggs directory. This can greatly speed buildouts, because already-built distributions can be found and used locally. However, buildouts that use shared eggs directories aren't self contained. They depend on the shared eggs directory. I'd like to be able to reuse previously-built eggs, but have eggs installed in my local buildout, so it's self contained.
Proposal: egg-cache ===================
If egg-cache is set to a directory, then when buildout builds an egg, it will copy it to the egg cache. When looking for distributions, it will look in the egg cache and, if it finds a matching egg there, it will copy the egg to the buildout eggs directory.
The end result will be that an egg cache will have the same economy as the current shared eggs directory, as far as building is concerned, but it won't have the disk-space saving of a shared eggs directiory. It will lead to buildouts that are self contained (at least wrt eggs) and that can be copied to a deployment environment directly.
Thoughts?
Jim
-- Jim Fulton http://www.linkedin.com/in/jimfulton _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

+1 This is potentially useful for Heroku or Google Apps, or anything that requires all your files to exist inside the build area after build time from where it creates the blob that is actually pushed to all the slaves. On Mon, May 13, 2013 at 4:26 PM, Daniel Holth <dholth@gmail.com> wrote:
Sounds like a nice feature. I like having copies of everything (and nothing extra) for each deployment.
On Mon, May 13, 2013 at 3:08 PM, Jim Fulton <jim@zope.com> wrote:
Problem =======
For (stage and production) deployment purposed, we, ZC, use RPMs. It's considered good hygene to produce source RPMs as well as binary RPMS. This led me to create zc.sourcerelease, which automates creation of self-contained source tar balls, that, among other benefits, provide input for making source RPMs, which feed into a process for creating binary RPMs.
We're moving toward a continuous deployment pipeline, where binaries are produced early in the development cycle and tested in a controlled environment that matches production. It no-longer (never did actually) makes sense to produce source RPMs that could be deployed in alternate (untested) environments.
In general, our existing build process is grotesquely slow:
- we run a buildout to produce a source release.
- we run it again to build a source (and binary) rpm from the source release.
Both of these run in such a way that all of the eggs have to be rebuilt. (But sources don't have to be downloaded.)
We'd like to move toward a model where we construct a build environment for each controlled deployment environment. In this build environment, we never want to build a given distribution more than once. We need to produce application binaries that are self contained.
Buildout allows you to use a shared eggs directory. This can greatly speed buildouts, because already-built distributions can be found and used locally. However, buildouts that use shared eggs directories aren't self contained. They depend on the shared eggs directory. I'd like to be able to reuse previously-built eggs, but have eggs installed in my local buildout, so it's self contained.
Proposal: egg-cache ===================
If egg-cache is set to a directory, then when buildout builds an egg, it will copy it to the egg cache. When looking for distributions, it will look in the egg cache and, if it finds a matching egg there, it will copy the egg to the buildout eggs directory.
The end result will be that an egg cache will have the same economy as the current shared eggs directory, as far as building is concerned, but it won't have the disk-space saving of a shared eggs directiory. It will lead to buildouts that are self contained (at least wrt eggs) and that can be copied to a deployment environment directly.
Thoughts?
Jim
-- Jim Fulton http://www.linkedin.com/in/jimfulton _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

+1 For deployment, I have a script that create a svn tag from trunk, run buildout and create tarball of the resulting buildout. On the production, another script will unpack the tarball and rerun buildout but this time with -N -o and since the eggs already in the tarball, it's very quick and it just regenerate the script and config file to reflect the actual production environment. The problem was when creating the release, as it's a fresh checkout from svn, the eggs dir not populated yet and due to issue [1], the buildout was very slow. While I can do something like copy the eggs dir from my current working directory into the tag dir before running buildout, it will make the eggs not "pristine" anymore since I might have tampered it with debugging stuff, it's in my working directory after all. [1]:https://github.com/buildout/buildout/issues/116 On Tue, May 14, 2013 at 3:08 AM, Jim Fulton <jim@zope.com> wrote:
Problem =======
For (stage and production) deployment purposed, we, ZC, use RPMs. It's considered good hygene to produce source RPMs as well as binary RPMS. This led me to create zc.sourcerelease, which automates creation of self-contained source tar balls, that, among other benefits, provide input for making source RPMs, which feed into a process for creating binary RPMs.
We're moving toward a continuous deployment pipeline, where binaries are produced early in the development cycle and tested in a controlled environment that matches production. It no-longer (never did actually) makes sense to produce source RPMs that could be deployed in alternate (untested) environments.
In general, our existing build process is grotesquely slow:
- we run a buildout to produce a source release.
- we run it again to build a source (and binary) rpm from the source release.
Both of these run in such a way that all of the eggs have to be rebuilt. (But sources don't have to be downloaded.)
We'd like to move toward a model where we construct a build environment for each controlled deployment environment. In this build environment, we never want to build a given distribution more than once. We need to produce application binaries that are self contained.
Buildout allows you to use a shared eggs directory. This can greatly speed buildouts, because already-built distributions can be found and used locally. However, buildouts that use shared eggs directories aren't self contained. They depend on the shared eggs directory. I'd like to be able to reuse previously-built eggs, but have eggs installed in my local buildout, so it's self contained.
Proposal: egg-cache ===================
If egg-cache is set to a directory, then when buildout builds an egg, it will copy it to the egg cache. When looking for distributions, it will look in the egg cache and, if it finds a matching egg there, it will copy the egg to the buildout eggs directory.
The end result will be that an egg cache will have the same economy as the current shared eggs directory, as far as building is concerned, but it won't have the disk-space saving of a shared eggs directiory. It will lead to buildouts that are self contained (at least wrt eggs) and that can be copied to a deployment environment directly.
Thoughts?
Jim
-- Jim Fulton http://www.linkedin.com/in/jimfulton _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
participants (4)
-
Daniel Holth
-
Jim Fulton
-
Leonardo Rochael Almeida
-
Mohd Kamal Bin Mustafa