custom data warehouse in python vs. out-of-the-box ETL tool
Martin P. Hellwig
martin.hellwig at dcuktec.org
Wed Sep 23 12:15:44 CEST 2009
> Thanks for your replies, Sean and Martin.
> I agree that the ETL tools are complex in themselves, and I may as
> well spend that learning curve on a lower-level tool-set that has the
> added value of greater flexibility.
> Can you suggest a good book or tutorial to help me build a data
> warehouse in python? Bill Inmon's "Building the Data Warehouse" is 17
> years old, and I've been cautioned against Kimball.
Data warehouse isn't something magical, it is just another database,
albeit containing multiple datasets gathered from foreign resources in
possibly multiple formats.
Depending on your purpose of what you want, you design your tables the
way you usually do. For example if you only want reporting, you might
want to build your tables in such a way so it makes your life easier to
build the actual report.
Now you have an empty database containing the fields you wish for the
report and have filled database(s) containing data from the user
application. Now you use Python to fill the empty database, tada, you
have a Data warehouse and used Python for ETL processing.
So if you already have some insights in creating tables in a database,
you are all set. Most likely you will go through a number of iterations
before you are happy with the result though.
There is no book substitute for applying theory, experience and common
sense to a problem you want to solve, unless you write it yourself for
that specific situation.
'If consumed, best digested with added seasoning to own preference.'
More information about the Python-list