[DB-SIG] DB scheme question

Dieter Maurer dieter at handshake.de
Wed Dec 15 08:25:21 CET 2010


John Q. Public wrote at 2010-12-13 12:03 -0800:
>I don't think I fully understand your suggestion.
>>From an OOP point of view, my application consists of 4 objects:
>    1. atoms
>    2. molecules - contain atoms
>    3. matters - contain molecules
>    4. versions - contain all the above
>The scheme suggested has 4 tables to hold the objects and 4 more tables to
>represent the connections by pairing foreign keys (e.g. atoms in molecules,
>molecules in matters and the three of the above in versions). Since I lake
>the experience in db programming,  I would appreciate if you can point my
>implementation mistakes so I could learn from them.

I did not suggest a database scheme at all. Instead, I suggested
to use an object oriented model and use an object relational mapper
(e.g. SQLAlchemy) to map this to the database. This way, there
is some chance that no big experience in db programming is necessary.

There is a natural object oriented model for your task.
It follows immediately from your problem description.
You could use it unless it would not be efficient enough.

>regarding the redundancy, each version contains few thousands of atoms,
>molecules and matters and the differences between versions is up to few
>tens. Isn't it a waste to hold all this relations in a table
>(atoms/molecules/matters to version).

It is. But it may or may not matter.
There is a rule: use a complex solution only where it is necessary.

Important considerations:

 * how much data you will get and especially how many versions.

   If you expect a huge number of versions (and a huge number
   of matters, molecules and atoms), then the simple scheme
   is probably not efficient enough.

   If you can expect that there typically will only be a few versions,
   then simplicity might rule.

 * which operations should be supported?

   Example:
   when you have to support to display differences between versions
   (and this must be efficient), this may indicate that
   versions must be modeled such that differences can be easily found.
   One approach could be to store the differences between the versions
   rather than all details.

>On the other hand, is it possible that
>my suggested relation might cause inefficient SELECT statements when
>searching for a version parameters?

I have not looked carefully enough at your model to answer this question
with certainty.
I expect that your solution will envolve joins between a version
record and its components. Provided that appropriate indexes are defined,
joins are usually well optimized by the database system.
Likely, you will need to worry only for large datasets.

Following the rule (KISS principle: Keep It Stupid and Simple)
stated above: implement your model (a simple one,
just as complex as it seems necessary for the task at hand) and
then observe the implementation (e.g. make measurements with
realistic datasets). Optimize (e.g. define indexes) as necessary.
Should moderate optimizations not be able to get acceptable behavior,
start over with a new more sofisticated model based on your new
experiences.



--
Dieter


More information about the DB-SIG mailing list