custom data warehouse in python vs. out-of-the-box ETL tool
Martin P. Hellwig
martin.hellwig at dcuktec.org
Wed Sep 23 21:28:45 CEST 2009
> @Martin: I originally thought that there was nothing "magical" about
> building a data warehouse, but then I did a little research and
> received all sorts of feedback about how data warehouse projects have
> notorious failure rates, that data warehouse design IS different than
> normal RDBMS - and then there's the whole thing about data marts vs.
> warehouses, Kimball vs. Inmon, star schemas, EAV tables, and so on.
> So I started to think that maybe I needed to get a little better read
> on the subject.
Yes failure rate for data warehouse projects is quite high, so are other
IT projects without data warehouses.
Data warehouse design is not that much different than 'normal' RDBMS,
you are following the same decisions for example:
- Do I rather have multiple copies of data than slow or complicated
access? If yes how do I ensure integrity?
- How do I do access control on value level?
- Do I need fail over or load balancing, if yes how much of it can I do
on the application level?
The thing is if you never have designed a database from a database point
of view because you used abstractions that hide these ugly details then
yes, Data warehouses are different than normal RDBMS.
Yes you can make it all sound very complicated by throwing PHB words in
like meta schema's, dimensional approach, subject orientated, etc.
I like to see it more like I have a couple of data sources, I want to
combine them in a way that I can do neat stuff with it. The 'combine
them' part is the data warehouse design, the 'neat stuff' part are the
clients that use that data warehouse. If you wish you can start making
it more complicated by saying but my data sources are also my clients,
but that is another step.
I guess you can sum up all this by saying a data warehouse is a method
of gathering already existing data and present them in a different way,
the concept is simple, the details can be complicated if you want/need
it to be.
'If consumed, best digested with added seasoning to own preference.'
More information about the Python-list