[DB-SIG] recommendation

Jean Meloche jean@stat.ubc.ca
Mon, 23 Mar 1998 22:06:47 -0800


Hi. I would like to get some advice on how to deal with my
large database. I have 3x10^6 records of length 1K. None
of the fields can serve as an index (none is unique) but
I need to be able to loop through lists of records for
which a specific field has a given value. I tried to use
gdbm to store the lists:

   key=value of target field      value=list of record numbers
   1234                           [1,45,3673,111324]
   3451                           [465,33,36678,555321]
   ...

The problem I have is at the creation of this gdbm file.
At the rate things are going now, it will take a long long
time. I used the same structure for other target fields and
did not have any difficulty. In the present case, however,
the values (lists of record numbers) have very different
lengths from key to key (some 1 or 2, others, many thousands).

Am I right to think that growing value fields are causing
the slowness?

What is the right way to do what I want to do? Several files
would work, but I'd end up with many thousands of them. Yuk.

Many thanks for any suggestions.

Jean Meloche