constructing and using large lexicon in a program

Dan Stromberg drsalists at
Tue Aug 3 04:22:03 CEST 2010

On Mon, Aug 2, 2010 at 10:46 AM, Majdi Sawalha <maj_sawalha at>wrote:

> Dear List members,
> I am developing a morphological analyzer that depends on a large lexicon. i
> construct a Lexicon class that reades a text file and construct a dictionary
> of the lexicon entries.
> the other class will use the lexicon class to chech if the word is found in
> the lexicon. the problem that this takes long time as each time an object of
> that class created, then it needs to call the lexicon many times. then when
> the lexicon is called it re-construct the lexicon again. is there any way to
> construct the lexicon one time during the execution of the program? and then
> the other modules will search the already constructed lexicon.
> best regards
> Majdi
> Faculty of Engineering
> School of Computing
> University of Leeds
> Leeds, LS2 9JT
> UK
> --
You want an object with a global lifetime.  I often wish Python had a nice
equivalent to static variables in C.

Sometimes people suggest doing this with a function or method argument that
takes a default value that looks like it would be different each time the
function is called, but in reality, it's evaluated once when the
function/method is created.


$ cat t

import time

def fn(x = time.time()):
        print x

for i in xrange(5):

benchbox-dstromberg:~/src/global-lifetime i486-pc-linux-gnu 13955 - above
cmd done 2010 Mon Aug 02 07:16 PM

$ ./t
benchbox-dstromberg:~/src/global-lifetime i486-pc-linux-gnu 13955 - above
cmd done 2010 Mon Aug 02 07:16 PM

IOW, it's printing the same time each iteration, despite having a 10
millisecond precision and a 1 second sleep between each call.

BTW, for storing a truly large lexicon, you might be better off with a trie
than a dictionary (AKA hash table), though I'm sure some lexicons aren't
huge enough to require it.  One simple way of doing a trie would be to mkdir
an empty directory, and then for each ith subdirectory depth beneath that,
have it correspond to the ith character of your word in your lexicon.
 That's probably not a terribly efficient trie, but I think it may get the
point across - you're not storing common prefixes over and over.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Python-list mailing list