[Baypiggies] I need some help architecting the big picture

jim jim at well.com
Tue Apr 29 02:03:39 CEST 2008

On Mon, 2008-04-28 at 13:46 -0700, Shannon -jj Behrens wrote:

jstockford munged to summarize: 
> * I have a bunch of customers manually giving me big chunks 
> of data that is not uniform.  Different
> customers give me the data in different ways. 
> * After a bunch of data crunching, I end up with a bunch of different
> TSV files (tab-separated format) containing different things, which I
> end up loading into separate databases for each customer.
> * The database is used to implement a Web service.  
> Anyway, so I have all this customer-specific logic to transform the 
> data into a common format, and all these data
> pipelines.  

   What does it mean, "manually"? can this include sneaker 
net with removeable media, ftp, email attachments, etc. 
and none of these means matters to the problem? 
   i'm guessing that the commonality is that all files are 
ASCII TSV; i.e. there is no necessary correlation of fields 
from one customer's file set to others, yes? 
   further guess: the customer-specific logic is entirely 
a matter of getting each customer's data into ASCII TSV 
format so's to load into a customer-specific database, yes? 
   some web-thing will pull from each customer database and 
present reports upon demand, yes? 

> How do I pull it together into something an operator would
> want to use?  Is the idea of an operator appropriate?  I'm pretty sure
> this is an "operations" problem. 
   my understanding of an operator is someone who monitors 
and manages the system without respect to the particulars 
of data coming in or going out. 
   but the only way i make sense of your use of the term is 
as someone who sits by a phone or email client and derives 
reports as humans request. i must be confused, yes? 

> My current course of action is to:
> * Create a global Makefile that knows how to do system-wide tasks.
> * Create a customer-specific Makefile for each customer.
> * The customer-specific Makefiles all "include" a shared Makefile.  I
> modeled this after FreeBSD's ports system.
> Hence, the customer-specific Makefiles have some customer-specific
> logic in them, but they can share code via the shared Makefile that
> they all include.
   this sounds like a classic driver structure: create a dumb 
little top-end driver, make a basic stub, make customer-
specific stubs that inherit from the basic stub, tweak 
and debug to taste, yes? 

> * Currently, all of the Python scripts take all their settings on the
> command line.  
   GASP!!! and the operator is going to type these command lines? 
or these are hard-coded into the master driver? or...? 

> I'm thinking that the settings belong in an included 
> Makefile that just contains settings.  By keeping the Python 
> dumb, I'm attempting to follow the "tools, not policy" idea.
   ...or the makefile. well, software is software, so given 
a reasonable interface and no performance problems, why not? 

> I'm having a little bit of a problem with testing.  I don't have a way
> of testing any Python code that talks to a database because the Python
> scripts are all dumb about how to connect to the database.  I'm
> thinking I might need to setup a "pretend" customer with a test
> database to test all of that logic.

   i stubbornly remain confused: customer-specific data is 
coming in, so customer-specific logic is required to get 
it into a customer-specific database that will serve out 
to customer-specific requests for reports or some other 
databased service. 
   sounds like separate software, one set for each customer, 
maybe based on some base classes that allow leveraging 
common functionality. 
   but even if the python remains ignorant of the data 
source, aren't there interface programs that work between 
the python and the database? 

> Does the idea of driving everything from Makefiles make sense?
   software is software 

> Is there an easier way to share data like database connection
> information between the Makefile and Python other than passing it in
> explicitly via command line arguments?
   the difference between command line calls and calls between 
functions in the same process space is just load time, mainly, 

> Is there anything that makes more sense than a bunch of
> customer-specific Makefiles that include a global Makefile?
   drivers and stubs makes sense. software is software. beware 
of the interfaces. 

> How do I get new batches of data into the system?  Do I just put the
> files in the right place and let the Makefiles take it from there?
   isn't this answered by whatever "manually" means? manually 
has gotta end up "here" somehow, and the data input software 
should know where "here" is. 

> Am I completely smoking, or am I on the right track?
   it's an interesting exercise, though not particularly 
pythonic, yes? aren't there sufficient python modules to 
build this strictly as a python app without significantly 
more work or risk? 

More information about the Baypiggies mailing list