Help with database planning

Sat Nov 14 06:43:11 EST 2009

2009/11/14 Juliano <jjunho at gmail.com>:
> Hello, everybody.
>
> I'm a linguist with practical skills on computers/programming.
>
> We've been working with an ontology at my department, and now I need
> to create a GUI viewer for the flat file we have.
> I tried to write an Ontology class which manages the data read and
> parsed from the flat file, but it takes a relatively long time.
> Besides, we have plans to set up a website for online display of said
> ontology. So, I have been being pushed towards changing the basic plan
> and build a DB so that data access will be faster and easier for both
> the desktop GUI and the web app. Right now, I'm trying to work with
> sqlite, since it can be used as a separate file for the GUI and as a
> DB for Django (which may be the choice for the web interface).
>
> I have been redaing some books on DBs but I kind of get stuck when it
> comes to the normalization and the planning of the tables. The problem
> is that we have basically four fields that can be arranged in a tree-
> like structure. Eg:
>
> Concept
>  |----- Slot
>  |        `------ Facet
>  |                  `------ Filler
>  `----- Slot
>           `------ Facet
>                     `------ Filler
>                     `------ Filler
>    ...
>
> So, for ONE *concept*, we have, usually, MANY *slots*, each *slot* has
> ONE *facet*, and each *facet* can have MORE THAN ONE *filler*.
> Besides, some *slots* and *fillers* are themselves *concepts*,
> creating a sort of recursive reference.
>
> <begin table>
> line_no concepts        slots   facets  fillers
> ------------------------------------------------------------------------------
> 00000   ABANDON DEFINITION      VALUE   "to leave or desert something or
> someone"
> 00001   ABANDON IS-A    VALUE   EXIT
> 00002   ABANDON LEXE    MAP-LEX "leave behind-V1"
> 00003   ABANDON LEXE    MAP-LEX abandon-V1
> (...)
> 97420   ZULU    DEFINITION      VALUE   "a language or dialect spoken in south
> africa and others"
> 97421   ZULU    INSTANCE-OF     VALUE   OTHER-NIGER-KORDOFANIAN-LANGUAGE
> 97422   ZULU    LANGUAGE-OF     INV     LESOTHO
> 97423   ZULU    LANGUAGE-OF     INV     SOUTH-AFRICA
> <end table>
>
> I tried to create index tables for concepts, slots, facets and
> fillers, which gave me the following table:
>
> <begin table>
> line_no concepts        slots   facets  fillers
> ------------------------------------------------------------------------------
> 00000   cn_00000        sl_00048        fc_00007        fl_07349
> 00001   cn_00000        cn_02605        fc_00007        cn_01768
> 00002   cn_00000        sl_00121        fc_00002        fl_04329
> 00003   cn_00000        sl_00121        fc_00002        fl_15009
> (...)
> 97420   cn_05429        sl_00048        fc_00007        fl_01340
> 97421   cn_05429        cn_02493        fc_00007        cn_03526
> 97422   cn_05429        cn_02750        fc_00001        cn_02816
> 97423   cn_05429        cn_02750        fc_00001        cn_04580
> <end table>
>
> (cn_XXXXX from concept index, sl_XXXXX from slot index,
> fc_XXXXX from facet index, fl_XXXXX from filler index.)
>
> As you can see, only concepts and facets are populated by their own
> type of data.
> Whereas slots and fillers can be populated by their own types or by
> concepts.
>
> What would be a good way to create tables for this situation?
> In fact, this is the first time I've ever tried to create a DB, so I'm
> completely lost.
>
> I'm looking forward to a reply...
>
> Thank you very much,
> Juliano

If you have an ontology that doesn't run into GB's of data you could
also consider this. Load it into an in memory data structure of your
choice from the text file. Here are the arguments in favour :-

1) The structure doesn't lend itself nicely to tables. So a relational
database may not be the best choice when you start traversing the
data. Imagine writing a query to get the data and show it as a tree.
2) Changes to the ontology are infrequent so you don't use most of the
ACID facilities the database offers.
3) cyc uses its own proprietory data format which is probably not a
_relational_ database
4) With your own data structure you know where to fix if the
performance is bad. The current poor performance could be due to some
other problem which may not go away on its own on switching to db.
5) Keeping the data in a flat file has the advantage of making it easy
to update and version control. Otherwise you need another program for
editing it.

I am no database expert so let's see if someone has a better table
design suggestion.

Thank You,
++imanshu