[Baypiggies] I need some help architecting the big picture

Andy Wiggin andywiggin at gmail.com
Tue Apr 29 00:23:38 CEST 2008


I can't speak to too many of the points, but...

I recently worked on a project which had some similarities to this.
Instead of UNIX pipelines (grep/sed/akw/...), I implemented a plugin
mechanism to gather data. The main class allows the registration of
plugins which then have the responsibility of parsing the particular
format and calling the main class to store the normalized data
(roughly speaking). In your case, you could name the plugin the same
as the customer and give the plugin name on the command line of the
main program, or detect it from the process cwd, or whatever is
convenient (and reliable). I liked the way the plugin system worked
out, in no small measure because it was then natural to write the
whole system in python. The uniformity might make the testing and
error detection/reporting a little easier, too.

Regards, Andy


On Mon, Apr 28, 2008 at 1:46 PM, Shannon -jj Behrens <jjinux at gmail.com> wrote:
> Hi,
>
>  I need some help architecting the big picture on my current project.
>  I'm usually a Web guy, which I understand very well.  However, my
>  current project is more batch oriented.  Here are the details:
>
>  * I have a bunch of customers.
>
>  * These customers give me batches of data.  One day, there might be
>  cron jobs for collecting this data from them.  One day I might have a
>  Web service that listens to updates from them and creates batches.
>  However, right now, they're manually giving me big chunks of data.
>
>  * I've built the system in a very UNIXy way right now.  That means
>  heavy use of cut, sort, awk, small standalone Python scripts, sh, and
>  pipes.  I've followed the advice of "do one thing well" and "tools,
>  not policy".
>
>  * The data that my customers give me is not uniform.  Different
>  customers give me the data in different ways.  Hence, I need some
>  customer-specific logic to transform the data into a common format
>  before I do the rest of the data pipeline.
>
>  * After a bunch of data crunching, I end up with a bunch of different
>  TSV files (tab-separated format) containing different things, which I
>  end up loading into a database.
>
>  * There's a separate database for each customer.
>
>  * The database is used to implement a Web service.  This part makes sense to me.
>
>  * I'm making heavy use of testing using nose.
>
>  Anyway, so I have all this customer-specifc logic, and all these data
>  pipelines.  How do I pull it together into something an operator would
>  want to use?  Is the idea of an operator appropriate?  I'm pretty sure
>  this is an "operations" problem.
>
>  My current course of action is to:
>
>  * Create a global Makefile that knows how to do system-wide tasks.
>
>  * Create a customer-specific Makefile for each customer.
>
>  * The customer-specific Makefiles all "include" a shared Makefile.  I
>  modeled this after FreeBSD's ports system.
>
>  Hence, the customer-specific Makefiles have some customer-specific
>  logic in them, but they can share code via the shared Makefile that
>  they all include.
>
>  * Currently, all of the Python scripts take all their settings on the
>  command line.  I'm thinking that the settings belong in an included
>  Makefile that just contains settings.  By keeping the Python dumb, I'm
>  attempting to follow the "tools, not policy" idea.
>
>  I'm having a little bit of a problem with testing.  I don't have a way
>  of testing any Python code that talks to a database because the Python
>  scripts are all dumb about how to connect to the database.  I'm
>  thinking I might need to setup a "pretend" customer with a test
>  database to test all of that logic.
>
>  Does the idea of driving everything from Makefiles make sense?
>
>  Is there an easier way to share data like database connection
>  information between the Makefile and Python other than passing it in
>  explicitly via command line arguments?
>
>  Is there anything that makes more sense than a bunch of
>  customer-specific Makefiles that include a global Makefile?
>
>  How do I get new batches of data into the system?  Do I just put the
>  files in the right place and let the Makefiles take it from there?
>
>  Am I completely smoking, or am I on the right track?
>
>  Thanks,
>  -jj
>
>  --
>  I, for one, welcome our new Facebook overlords!
>  http://jjinux.blogspot.com/
>  _______________________________________________
>  Baypiggies mailing list
>  Baypiggies at python.org
>  To change your subscription options or unsubscribe:
>  http://mail.python.org/mailman/listinfo/baypiggies
>


More information about the Baypiggies mailing list