[Baypiggies] I need some help architecting the big picture

Tue Apr 29 07:30:51 CEST 2008

>    What does it mean, "manually"? can this include sneaker
>  net with removeable media, ftp, email attachments, etc.
>  and none of these means matters to the problem?

Currently, yes.

>    i'm guessing that the commonality is that all files are
>  ASCII TSV;

I wish ;)  Some are log files.  Some are database dumps.

> i.e. there is no necessary correlation of fields
>  from one customer's file set to others, yes?

I'm looking for how users interact with things.  I.e. user A bought soap.

>    further guess: the customer-specific logic is entirely
>  a matter of getting each customer's data into ASCII TSV
>  format so's to load into a customer-specific database, yes?

Yes.

>    some web-thing will pull from each customer database and
>  present reports upon demand, yes?

No.  I don't need to provide reports so much as answer queries about their data.

>    my understanding of an operator is someone who monitors
>  and manages the system without respect to the particulars
>  of data coming in or going out.

That's correct.  My definition of an operator is someone who either
pulls the batches of data when requested, or setups up a cron job to
do it and fixes it if it breaks.

>    this sounds like a classic driver structure: create a dumb
>  little top-end driver, make a basic stub, make customer-
>  specific stubs that inherit from the basic stub, tweak
>  and debug to taste, yes?

Yep.

>  > * Currently, all of the Python scripts take all their settings on the
>  > command line.
>  >
>    GASP!!! and the operator is going to type these command lines?
>  or these are hard-coded into the master driver? or...?

They're driven via the Makefiles.  If you have a customer-specific
Makefile with some customer-specific settings, than the main Makefile
knows which settings each little Python script needs.

>  > I'm having a little bit of a problem with testing.  I don't have a way
>  > of testing any Python code that talks to a database because the Python
>  > scripts are all dumb about how to connect to the database.  I'm
>  > thinking I might need to setup a "pretend" customer with a test
>  > database to test all of that logic.
>
>    i stubbornly remain confused: customer-specific data is
>  coming in, so customer-specific logic is required to get
>  it into a customer-specific database that will serve out
>  to customer-specific requests for reports or some other
>  databased service.

Yep.

>    sounds like separate software, one set for each customer,

The data format that customers give me varies, but what I want to do
with the data and the service I provide is always the same.

>  maybe based on some base classes that allow leveraging
>  common functionality.
>    but even if the python remains ignorant of the data
>  source, aren't there interface programs that work between
>  the python and the database?

Once I get the data into a common format, getting it into the database is easy.

>  > Does the idea of driving everything from Makefiles make sense?
>  >
>    software is software

Ah, interesting perspective ;)

>  > Is there an easier way to share data like database connection
>  > information between the Makefile and Python other than passing it in
>  > explicitly via command line arguments?
>  >
>    the difference between command line calls and calls between
>  functions in the same process space is just load time, mainly,
>  yes?

It's really all about following the UNIX way of "do one thing well"
rather than building a huge C++ monolithic binary.

>  > How do I get new batches of data into the system?  Do I just put the
>  > files in the right place and let the Makefiles take it from there?
>  >
>    isn't this answered by whatever "manually" means? manually
>  has gotta end up "here" somehow, and the data input software
>  should know where "here" is.

Yeah.  That's why I was asking.  I've never had this situation before,
and I was wondering if some of you old timers had been in this
situation and had some advice about the right way to approach it ;)  I
guess I'm confused about workflow and process as much as anything
else.

>  > Am I completely smoking, or am I on the right track?
>  >
>    it's an interesting exercise, though not particularly
>  pythonic, yes?

Everything complex is written in Python, but I'm leaning heavily on
the UNIX philosophy and tools.

> aren't there sufficient python modules to
>  build this strictly as a python app without significantly
>  more work or risk?

In a lot of situations, I can get from one step in the pipeline to
another using only UNIX tools like cut, sort, and one line awk
scripts.  That's really nice.  It's easier to implement the whole
thing via sh tying together small tools than one gigantic program that
does everything.  Small tools are easy to understand, test, and debug.

-jj

-- 
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/