custom data warehouse in python vs. out-of-the-box ETL tool

Martin P. Hellwig martin.hellwig at dcuktec.org
Wed Sep 23 12:15:44 CEST 2009


snfctech wrote:
> Thanks for your replies, Sean and Martin.
> 
> I agree that the ETL tools are complex in themselves, and I may as
> well spend that learning curve on a lower-level tool-set that has the
> added value of greater flexibility.
> 
> Can you suggest a good book or tutorial to help me build a data
> warehouse in python?  Bill Inmon's "Building the Data Warehouse" is 17
> years old, and I've been cautioned against Kimball.
> 
> Thanks.
<cut>

Data warehouse isn't something magical, it is just another database, 
albeit containing multiple datasets gathered from foreign resources in 
possibly multiple formats.

Depending on your purpose of what you want, you design your tables the 
way you usually do. For example if you only want reporting, you might 
want to build your tables in such a way so it makes your life easier to 
build the actual report.

Now you have an empty database containing the fields you wish for the 
report and have filled database(s) containing data from the user 
application. Now you use Python to fill the empty database, tada, you 
have a Data warehouse and used Python for ETL processing.

So if you already have some insights in creating tables in a database, 
you are all set. Most likely you will go through a number of iterations 
before you are happy with the result though.

There is no book substitute for applying theory, experience and common 
sense to a problem you want to solve, unless you write it yourself for 
that specific situation.

-- 
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'



More information about the Python-list mailing list