[Spambayes] some preliminary timings
Skip Montanaro
skip at pobox.com
Mon Feb 24 19:28:44 EST 2003
>> Such a huge difference between hammiefilter and a raw filter loop
>> suggests I may have done something wrong. Still, perhaps opening the
>> db file for each message and all the imports hammiefilter has to do
>> simply kills the performance.
Neil> Yes. Try "strace python hammiefilter.py". I count 848 open()
Neil> system calls. 702 of them return ENOENT. A (relatively) small
Neil> sample:
...
Making a simple pass through sys.path deleting non-existent directories
doesn't help in my case (path length decreased from six directories to
five).
Hmmm... It would be kind of interesting to override __import__ to look up
modules in a saved dictionary. At program exit:
locations = {}
for m in sys.modules:
if hasattr(sys.modules[m], "__file__"):
f = sys.modules[m].__file__
if f.endswith(".pyc"):
locations[m] = f
import cPickle
cPickle.dump(locations, open("hf.pickle", "w"))
then at program startup:
if os.path.exists("hf.pickle"):
import cPickle, marshal
_locations = cPickle.load(open("hf.pickle"))
def hf_import(name, globals=None, locals=None, fromlist=None,
locations=_locations, impt=__import__):
if name in locations and name not in sys.modules:
# we know where to find this module already
... magic here ...
return mod
return impt(name, globals, locals, fromlist)
import __builtin__
__builtin__.__import__ = hf_import
I fiddled around a bit but couldn't come up with the "... magic here ..."
part.
Skip
More information about the Spambayes
mailing list