[OT] large db question about no joins
Martin P. Hellwig
martin.hellwig at dcuktec.org
Fri Apr 17 01:10:34 CEST 2009
Daniel Fetchinson wrote:
> [off but interesting topic]
> What would be the corresponding database layout that would scale and I
> could get the total number of legs in the zoo or total number of
> animals in the zoo without join(s)?
> [/off but interesting topic]
That all comes down to the keywords, efficiency, robustness and
performance and you can only pick one of them. So which two can you
sacrifice? The good news is that it is only theoretical, if you have
your requirements (i.e. the query with the right results has to return
within an acceptable time for the user not to get frustrated) who cares
if it is not as fast as it theoretically could be? Especially if this
means you can sacrifice the theoretically performance for easier
deployment and maintenance.
To get back to the layout question, if I need to have a count of the
legs at any given time, I would like to see the business process that
requires this operation. Using the business process I could probably
narrow down the scope quite a bit like, it is not necessary to have a
precise count at time *now* but it is good enough to have an exact count
which is no more then 4 seconds old but the query does need to return
within 50 ms.
One solution could be to build a central data warehouse where all info
is stored in. Then have satellite db's that does ETL syncing with the
central one. The layout of the satellites contain an optimised table
version for the queries you want to throw at it.
If you need to scale it, that is a question of adding another satellite.
Efficiency is right out of the window because you store in essence just
multiple copies of the same data just in another order. However it is
robust (by having multiple copies) and has a predictable performance figure.
More information about the Python-list