'database' search

Christian Tismer tismer at trixie.triqs.com
Sun Apr 23 17:22:43 EDT 2000


Bjorn Pettersen wrote:
[database with 1500 to 5000 items]

> The smallest datastructure I've found so far, is a 'prefix dictionary',
> so if e.g. "aardwark" was in the database, the 'prefix dictionary' would
> contain:
> 
>  pfdict = {
>   'a':        ['aardwark', ...],
>   'aa':       ['aardwark', ...],
>   'aar':      ['aardwark'],
>   'aard':     ['aardwark'],
>   'aardw':    ['aardwark'],
>   'aardwa':   ['aardwark'],
>   'aardwar':  ['aardwark'],
>   'aardwark': ['aardwark']
>  }
> 
> The maximum size for this would be n * avg len of item, with the real
> size being much(?) smaller with sufficient prefix sharing...

Well, your database is small enough to stand this number
of generated dict items.
In the general (larger) case I'd suggest to use a "trie"
structure. This is dividing your key set into maximum
unique subkeys. This may be coded with nested dictionaries
quite nicely.

Example:
aardwark      1
aaron         2
advertizing   3
advantage     4
python        5

gives you the following structure:

trie = {
  "a" : {
          "a":  {
                  "rdwark" : 1
                  "ron"    : 2
                }
          "dv": {
                  "ertizing": 3
                  "antage"  : 4
                }
        }
  "p" : {
          "ython": 5
        }
}

This should tend to give you quite small dicts, and relatively
fast access. Whenever you end up in a unique subkey, you may
do the completion. Using keys() and then sorting out isn't
too slow. Preofix sharing is there, too.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     where do you want to jump today?   http://www.stackless.com




More information about the Python-list mailing list