custom data warehouse in python vs. out-of-the-box ETL tool

Sean DiZazzo half.italian at gmail.com
Tue Sep 22 17:59:29 EDT 2009


On Sep 22, 1:00 pm, snfctech <tschm... at sacfoodcoop.com> wrote:
> Does anyone have experience building a data warehouse in python?  Any
> thoughts on custom vs using an out-of-the-box product like Talend or
> Informatica?
>
> I have an integrated system Dashboard project that I was going to
> build using cross-vendor joins on existing DBs, but I keep hearing
> that a data warehouse is the way to go.  e.g. I want to create orders
> and order_items with relations to members (MS Access DB), products
> (flat file) and employees (MySQL).
>
> Thanks in advance for any tips.

I have done some small/medium sized stuff using SQLAlchemy,
Turbogears, and Flex.  I have never used a commercial product, but I
imagine getting it set up to work with your data is the hardest part
of the job anyway, and the solution you end up with will most likely
limit you to applying their api to your data.  If you build it
yourself, you have complete control, and know exactly where to go when
you have a problem, or to add a feature.

I'm no expert, but I think I would try to find a way to consolidate
the data into one data source.  We handle the giant amount of data we
are collecting by preprocessing it into another DB anyway, so I
imagine you could do both things at the same time.

This could very probably be handled in a different way if you are a
DBA.  I'm just a MySQL hack.  :)

~Sean



More information about the Python-list mailing list