buildout download cache

Hi, We are using extensively buildout to build ERP5 [1]: - https://svn.erp5.org/repos/public/erp5/trunk/buildout/ We are planning to extend buildout download API to provide a distributed automatic packaging and caching system so that even if original source web site is down, the buildout process can stil run. This could also be useful to build software in secured networks. Do you know of any similar project before we start? [1] www.erp5.com Regards, Rafael Monnerat ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.

On Sun, Jan 09, 2011 at 10:59:18PM +0100, rafael@nexedi.com wrote:
We are planning to extend buildout download API to provide a distributed automatic packaging and caching system so that even if original source web site is down, the buildout process can stil run. This could also be useful to build software in secured networks.
Do you know of any similar project before we start?
http://pypi.python.org/pypi/collective.eggproxy (I haven't used it myself yet. When PyPI or original mirrors go down, I can always find the necessary package in my development machine's ~/.buildout/cache/ and copy it over to where I need it -- because I've set a global download-cache in ~/.buildout/default.cfg.) Marius Gedminas -- Computers are not intelligent. They only think they are.

Hi, I give you a better explanation with a Use Case. We would like to a solution not only for eggs download but for ALL kind of downloads that should be done by the buildout. For example haproxy setup [1]: [haproxy] recipe = hexagonit.recipe.cmmi url = http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.9.tar.gz md5sum = 2cbcc95b54c0d803edaa13e7b4aeec25 This recipe uses zc.buildout.download (standard buildout API) and requires this that url be online when you run buildout for the first time. Considering that we get files from multiple source, we increase the changes that some service be offline, breaking the buildout setup. The good approach for all downloads should be: 1st: Try Local cache (buildout download API already do this) which are not available in the first run. 2nd: Try original URL (buildout download API already do this) 3rd: Try a cache server or "network cache" (what we want to do or know if someone already did it) This will guarantee that buildout could still running even original source is offline. It is also preferred that ALL recipes can use this network cache and be unified (or available) to download eggs too. [1] https://svn.erp5.org/repos/public/erp5/trunk/buildout/software-profiles/hapr... I beleave eggproxy is only usefull for eggs download, right? Use global download-cache helps to share downloads between your builds but it do not help first time users. Regards, Rafael Monnerat Quoting "Marius Gedminas" <marius@pov.lt>:
On Sun, Jan 09, 2011 at 10:59:18PM +0100, rafael@nexedi.com wrote:
We are planning to extend buildout download API to provide a distributed automatic packaging and caching system so that even if original source web site is down, the buildout process can stil run. This could also be useful to build software in secured networks.
Do you know of any similar project before we start?
http://pypi.python.org/pypi/collective.eggproxy
(I haven't used it myself yet. When PyPI or original mirrors go down, I can always find the necessary package in my development machine's ~/.buildout/cache/ and copy it over to where I need it -- because I've set a global download-cache in ~/.buildout/default.cfg.)
Marius Gedminas -- Computers are not intelligent. They only think they are.
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.

rafael wrote:
We are planning to extend buildout download API to provide a distributed automatic packaging and caching system so that even if original source web site is down, the buildout process can stil run. This could also be useful to build software in secured networks.
I'd strongly suggest keeping this logic out of the download API. It sounds like something that may potentially grow a lot more complex than a simple "download this URL, with or without using a cache" gesture. In my opinion, a distributed packaging system is application logic from the perspective of a generic framework such as zc.buildout. It might be implemented by a recipe, some library on top of the download API or some other mechanism altogether, but it should neither complicate the semantics of the existing download API nor add a new one to the zc.buildout code base. -- Thomas

Hi, I made a patch to zc.buildout.download which introduces a very simple "network cache" or "cache into network". So, I tried to keep this as simple and generic as buildout is. Basically, with the patch the download of a file, follow this order: 1. Try Local Cache (no change from original) 2. Try network cache (I explain bellow) 3. Try original URL 4. Post file data to network cache (pure HTTP) So, the network cache is just one URL where files are placed (can be any simple HTTP) and identified by file MD5 like this: GET http://my.company.shared.cache/md5_provided_for_the_file The cache update is done by a simple post to same adress: POST http://my.company.shared.cache/md5_provided_for_the_file < data As I'm familiar with ERP5, I implemented a very simple way to handle this cache, but I can also contribute with a simpler solution like eggproxy do for eggs. If you think my patch is useful and ok. As I don't have access to svn to make a branch, I'm attaching the patch. With network cache, I think people can share downloads into private networks or prevent your build is break when some source is unavailable. If you consider this behaviour inapropriate for the core of buildout, but appropriated to be an buildout extension, let me know. Regards, Rafael Monnerat On 11-01-2011 06:03, Thomas Lotze wrote:
rafael wrote:
We are planning to extend buildout download API to provide a distributed automatic packaging and caching system so that even if original source web site is down, the buildout process can stil run. This could also be useful to build software in secured networks. I'd strongly suggest keeping this logic out of the download API. It sounds like something that may potentially grow a lot more complex than a simple "download this URL, with or without using a cache" gesture.
In my opinion, a distributed packaging system is application logic from the perspective of a generic framework such as zc.buildout. It might be implemented by a recipe, some library on top of the download API or some other mechanism altogether, but it should neither complicate the semantics of the existing download API nor add a new one to the zc.buildout code base.
participants (4)
-
Marius Gedminas
-
Rafael Monnerat
-
rafael@nexedi.com
-
Thomas Lotze