[Spambayes] Re: some preliminary timings

Skip Montanaro skip at pobox.com
Tue Feb 25 09:59:45 EST 2003


Executive summary for python-dev folks seeing this for the first time:

    This thread started at

        http://mail.python.org/pipermail/spambayes/2003-February/003520.html

    Running in a single interpreter loop, I can score roughly 46
    messages per second.  Running from the shell using hammiefilter.py
    (which takes a msg on stdin and spits a scored message to stdout)
    performance drops to roughly 2 messages per second.  Neil
    Schmenenauer noted all the failed open() calls during import
    lookup, which got me started trying to whittle them down.

Two more things to try before abandoning this quixotic adventure...

It appears $prefix/python23.zip is left in sys.path even if it doesn't exist
(Just van Rossum explained to me in a bug report I filed that nonexistent
directories might actually be URLs or other weird hacks which import hooks
could make use of), so I went with the flow and created it, populating it
with the contents of $prefix/python2.3.  My averate wallclock time went from
0.5 seconds to 0.47 seconds and user+sys times went from 0.43 seconds to
0.41 seconds.  A modest improvement.

One more little tweak.  I moved the lib-dynload directory to the front of
sys.path (obviously only safe if nothing there appears earlier in sys.path).
Wall clock average stayed at 0.47 seconds and user+sys at 0.41 seconds,
though the total number of system calls as measured by ktrace went from 3454
to 3042.

Hammiefilter itself really does very little.  Looking at the last
ktrace/kdump output, I see 3042 system calls.  The hammie.db file isn't
opened until line 2717.  All the rest before that is startup stuff, the
largest chunk of which are nami operations (731) and open (557) calls, most
of them involving nonexistent files (as evidenced by seeing only 164 calls
to close()).  In contrast, only 278 system calls appear to be directly
related to manipulating the hammie database.

This is still somewhat off-topic for this list (except for the fact that my
intention was to get hammiefilter to run faster), so I'll cc python-dev to
keep Tim happy, and perhaps mildly irritate Guido by discussing specific
apps on python-dev.

Skip



More information about the Spambayes mailing list