[Baypiggies] I need some help architecting the big picture
jim at well.com
Tue Apr 29 02:03:39 CEST 2008
On Mon, 2008-04-28 at 13:46 -0700, Shannon -jj Behrens wrote:
jstockford munged to summarize:
> * I have a bunch of customers manually giving me big chunks
> of data that is not uniform. Different
> customers give me the data in different ways.
> * After a bunch of data crunching, I end up with a bunch of different
> TSV files (tab-separated format) containing different things, which I
> end up loading into separate databases for each customer.
> * The database is used to implement a Web service.
> Anyway, so I have all this customer-specific logic to transform the
> data into a common format, and all these data
What does it mean, "manually"? can this include sneaker
net with removeable media, ftp, email attachments, etc.
and none of these means matters to the problem?
i'm guessing that the commonality is that all files are
ASCII TSV; i.e. there is no necessary correlation of fields
from one customer's file set to others, yes?
further guess: the customer-specific logic is entirely
a matter of getting each customer's data into ASCII TSV
format so's to load into a customer-specific database, yes?
some web-thing will pull from each customer database and
present reports upon demand, yes?
> How do I pull it together into something an operator would
> want to use? Is the idea of an operator appropriate? I'm pretty sure
> this is an "operations" problem.
my understanding of an operator is someone who monitors
and manages the system without respect to the particulars
of data coming in or going out.
but the only way i make sense of your use of the term is
as someone who sits by a phone or email client and derives
reports as humans request. i must be confused, yes?
> My current course of action is to:
> * Create a global Makefile that knows how to do system-wide tasks.
> * Create a customer-specific Makefile for each customer.
> * The customer-specific Makefiles all "include" a shared Makefile. I
> modeled this after FreeBSD's ports system.
> Hence, the customer-specific Makefiles have some customer-specific
> logic in them, but they can share code via the shared Makefile that
> they all include.
this sounds like a classic driver structure: create a dumb
little top-end driver, make a basic stub, make customer-
specific stubs that inherit from the basic stub, tweak
and debug to taste, yes?
> * Currently, all of the Python scripts take all their settings on the
> command line.
GASP!!! and the operator is going to type these command lines?
or these are hard-coded into the master driver? or...?
> I'm thinking that the settings belong in an included
> Makefile that just contains settings. By keeping the Python
> dumb, I'm attempting to follow the "tools, not policy" idea.
...or the makefile. well, software is software, so given
a reasonable interface and no performance problems, why not?
> I'm having a little bit of a problem with testing. I don't have a way
> of testing any Python code that talks to a database because the Python
> scripts are all dumb about how to connect to the database. I'm
> thinking I might need to setup a "pretend" customer with a test
> database to test all of that logic.
i stubbornly remain confused: customer-specific data is
coming in, so customer-specific logic is required to get
it into a customer-specific database that will serve out
to customer-specific requests for reports or some other
sounds like separate software, one set for each customer,
maybe based on some base classes that allow leveraging
but even if the python remains ignorant of the data
source, aren't there interface programs that work between
the python and the database?
> Does the idea of driving everything from Makefiles make sense?
software is software
> Is there an easier way to share data like database connection
> information between the Makefile and Python other than passing it in
> explicitly via command line arguments?
the difference between command line calls and calls between
functions in the same process space is just load time, mainly,
> Is there anything that makes more sense than a bunch of
> customer-specific Makefiles that include a global Makefile?
drivers and stubs makes sense. software is software. beware
of the interfaces.
> How do I get new batches of data into the system? Do I just put the
> files in the right place and let the Makefiles take it from there?
isn't this answered by whatever "manually" means? manually
has gotta end up "here" somehow, and the data input software
should know where "here" is.
> Am I completely smoking, or am I on the right track?
it's an interesting exercise, though not particularly
pythonic, yes? aren't there sufficient python modules to
build this strictly as a python app without significantly
more work or risk?
More information about the Baypiggies