[Spambayes] some preliminary timings

Skip Montanaro skip at pobox.com
Mon Feb 24 21:53:24 EST 2003


Dumping all the Python code into a zip file didn't help either.  Using the
locations of the various Python modules saved at exit in an earlier run, I
dumped all of them into a zip file:

    >>> z = zipfile.PyZipFile("hf.zip", mode="w")
    >>> for key in loc:
    ...   z.writepy(loc[key][:-1])
    ... 
    >>> z.close()

Adding hf.zip to PYTHONPATH I could see stuff getting loaded from there:

    % PYTHONPATH=`pwd`/hf.zip /usr/bin/time python -v hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null
    # installing zipimport hook
    import zipimport # builtin
    # installed zipimport hook
    # zipimport: found 56 names in /Users/skip/src/spambayes/hf.zip
    import posix # builtin
    import stat # loaded from Zip /Users/skip/src/spambayes/hf.zip/stat.pyo
    import posixpath # loaded from Zip /Users/skip/src/spambayes/hf.zip/posixpath.pyo
    import UserDict # loaded from Zip /Users/skip/src/spambayes/hf.zip/UserDict.pyo
    ...

yet the performance was no better:

    % /usr/bin/time python hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null
            0.41 real         0.26 user         0.14 sys
    % PYTHONPATH=`pwd`/hf.zip /usr/bin/time python hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null
            0.44 real         0.27 user         0.10 sys

I then tried ktrace (Mac OS X's equivalent to strace(1)).  Executing

    ktrace python hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null

yielded some interesting results.  If I search the output for "errno 2 No
such file or directory" I get 1056 hits, 226 of which are for attempts to
open files in the nonexistent file /Users/skip/local/lib/python23.zip.  That
seems to be some side effect of the new zip importer stuff.

If I then run with PYTHONPATH referencing my stash of .pyo files in hf.zip I
see 832 "no such file" responses, and only 86 occurrences of the nonexistent
python23.zip.  Creating an empty /Users/skip/local/lib/python2.3/
sitecustomize.py file brought the "no such file" lines down to 805.

Another thing which might be useful is to change the order in which Python
tries module file extensions.  Since most modules are written in Python,
fewer failed stat() calls would be made if files ending in ".py" were
considered before files ending in ".so" and "module.so".  That's outside the
realm of spambayes though.

Skip



More information about the Spambayes mailing list