[Baypiggies] Web Crawler/Backend Engineer - San Francisco, CA

K. Richard Pixley rich at noir.com
Thu Feb 4 19:28:24 CET 2010


Jeff Kunce wrote:
> Actually, I'm wondering about the scope of the original question about
> an event-based framework:
>  - Just wanting objects to communicate with events? Something simple
> like circuits is good.
>  - async IO? asyncore, medusa et al are tried and true
>  - A comprehensive framework with everything included (even peanut
> butter and jelly)? Twisted
>
> The scope really matters.  If you just want to send simple events
> between objects in your app, you could write your own circuits-like
> framework before you could even figure out how to read the
> documentation for Twisted.
I'm building a moderately complex automated builder based primarily on 
incremental builds.  Multiple servers, each parallel in their own right, 
building multiple code branches against multiple target machines.  So 
I've got three or four distinct levels of parallelism going on in the 
builds already.  And there are several types of checking going on, 
including precommit speculative testing, post commit "continuous 
builds", a form of partial building used for dependency checking, and a 
sort of pre-pre-commit checking, (essentially branched recurrences of 
the pre-commit checker).  The builder monitors a source code repository 
building when necessary as well as taking speculative build requests 
from users and building those.

Most of this parallelism doesn't happen in python.  The python pieces of 
the builder coordinate the various builders, partition and aggregate 
very high level parallelism, manage working directories, and alerting 
mechanisms.

My plan is a distributed, dynamic, asynchronous, fault tolerant 
architecture based on virtual synchrony, 
(http://en.wikipedia.org/wiki/Virtual_synchrony - probably the spread 
toolkit http://spread.org).

There's also a set of status and input web pages, (javascript based on 
ext-js grid widget), and my plan is to produce their data using pylons, 
perhaps tg2, which pulls data out of the virtual synchrony "cloud", 
probably via spread, but potentially via traditional socket calls, maybe 
even jsonrpc.

So the "daemon"/cloud part needs to be capable of managing distributed, 
shared state, including the work queue, through spread, as well as 
managing a few subtasks like builds.  I'm not big on threads, and they 
don't really help much in the virtual synchrony paradigm anyway.

It's not a /huge/ project, but I'm expecting it to take me at least a 
few months to implement.

--rich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20100204/cc99c28f/attachment.htm>


More information about the Baypiggies mailing list