[Spambayes] some preliminary timings

Skip Montanaro skip at pobox.com
Mon Feb 24 19:28:44 EST 2003


    >> Such a huge difference between hammiefilter and a raw filter loop
    >> suggests I may have done something wrong.  Still, perhaps opening the
    >> db file for each message and all the imports hammiefilter has to do
    >> simply kills the performance.

    Neil> Yes.  Try "strace python hammiefilter.py".  I count 848 open()
    Neil> system calls.  702 of them return ENOENT.  A (relatively) small
    Neil> sample:

    ...

Making a simple pass through sys.path deleting non-existent directories
doesn't help in my case (path length decreased from six directories to
five).

Hmmm...  It would be kind of interesting to override __import__ to look up
modules in a saved dictionary.  At program exit:

    locations = {}
    for m in sys.modules:
        if hasattr(sys.modules[m], "__file__"):
            f = sys.modules[m].__file__
            if f.endswith(".pyc"):
                locations[m] = f
    import cPickle
    cPickle.dump(locations, open("hf.pickle", "w"))

then at program startup:

    if os.path.exists("hf.pickle"):
        import cPickle, marshal
        _locations = cPickle.load(open("hf.pickle"))
        def hf_import(name, globals=None, locals=None, fromlist=None,
                      locations=_locations, impt=__import__):
            if name in locations and name not in sys.modules:
                # we know where to find this module already
                ... magic here ...
                return mod
            return impt(name, globals, locals, fromlist)
        import __builtin__
        __builtin__.__import__ = hf_import

I fiddled around a bit but couldn't come up with the "... magic here ..."
part.

Skip




More information about the Spambayes mailing list