constructing and using large lexicon in a program
Peter Otten
__peter__ at web.de
Tue Aug 3 04:00:26 EDT 2010
Majdi Sawalha wrote:
> I am developing a morphological analyzer that depends on a large lexicon.
> i construct a Lexicon class that reades a text file and construct a
> dictionary of the lexicon entries.
> the other class will use the lexicon class to chech if the word is found
> in the lexicon. the problem that this takes long time as each time an
> object of that class created, then it needs to call the lexicon many
> times. then when the lexicon is called it re-construct the lexicon again.
> is there any way to construct the lexicon one time during the execution of
> the program? and then the other modules will search the already
> constructed lexicon.
Normally you just structure your application accordingly. Load the dictionary
once and then pass it around explicitly:
import loader
import user_one
import user_two
filename = ...
large_dict = loader.load(filename)
user_one.use_dict(large_dict)
user_two.use_dict(large_dict)
You may also try a caching scheme to avoid parsing the text file unless it has
changed. Here's a simple example:
$ cat cachedemo.py
import cPickle as pickle
import os
def load_from_text(filename):
# replace with your code
with open(filename) as instream:
return dict(line.strip().split(None, 1) for line in instream)
def load(filename, cached=None):
if cached is None:
cached = filename + ".pickle"
if os.path.exists(cached) and os.path.getmtime(filename) <= os.path.getmtime(cached):
print "using pickle"
with open(cached, "rb") as instream:
return pickle.load(instream)
else:
print "loading from text"
d = load_from_text(filename)
with open(cached, "wb") as out:
pickle.dump(d, out, pickle.HIGHEST_PROTOCOL)
return d
if __name__ == "__main__":
if not os.path.exists("tmp.txt"):
print "creating example data"
with open("tmp.txt", "w") as out:
out.write("""\
alpha value for alpha
beta BETA
gamma GAMMA
""")
print load("tmp.txt")
$ python cachedemo.py
creating example data
loading from text
{'alpha': 'value for alpha', 'beta': 'BETA', 'gamma': 'GAMMA'}
$ python cachedemo.py
using pickle
{'alpha': 'value for alpha', 'beta': 'BETA', 'gamma': 'GAMMA'}
$ echo 'delta modified text' >> tmp.txt
$ python cachedemo.py
loading from text
{'alpha': 'value for alpha', 'beta': 'BETA', 'gamma': 'GAMMA', 'delta': 'modified text'}
$ python cachedemo.py
using pickle
{'alpha': 'value for alpha', 'beta': 'BETA', 'gamma': 'GAMMA', 'delta': 'modified text'}
Peter
More information about the Python-list
mailing list