Executive summary for python-dev folks seeing this for the first time:
This thread started at
http://mail.python.org/pipermail/spambayes/2003-February/003520.html
Running in a single interpreter loop, I can score roughly 46 messages per second. Running from the shell using hammiefilter.py (which takes a msg on stdin and spits a scored message to stdout) performance drops to roughly 2 messages per second. Neil Schmenenauer noted all the failed open() calls during import lookup, which got me started trying to whittle them down.
Two more things to try before abandoning this quixotic adventure...
It appears $prefix/python23.zip is left in sys.path even if it doesn't exist (Just van Rossum explained to me in a bug report I filed that nonexistent directories might actually be URLs or other weird hacks which import hooks could make use of), so I went with the flow and created it, populating it with the contents of $prefix/python2.3. My averate wallclock time went from 0.5 seconds to 0.47 seconds and user+sys times went from 0.43 seconds to 0.41 seconds. A modest improvement.
One more little tweak. I moved the lib-dynload directory to the front of sys.path (obviously only safe if nothing there appears earlier in sys.path). Wall clock average stayed at 0.47 seconds and user+sys at 0.41 seconds, though the total number of system calls as measured by ktrace went from 3454 to 3042.
Hammiefilter itself really does very little. Looking at the last ktrace/kdump output, I see 3042 system calls. The hammie.db file isn't opened until line 2717. All the rest before that is startup stuff, the largest chunk of which are nami operations (731) and open (557) calls, most of them involving nonexistent files (as evidenced by seeing only 164 calls to close()). In contrast, only 278 system calls appear to be directly related to manipulating the hammie database.
This is still somewhat off-topic for this list (except for the fact that my intention was to get hammiefilter to run faster), so I'll cc python-dev to keep Tim happy, and perhaps mildly irritate Guido by discussing specific apps on python-dev.
Far from it, I wish spambayes well (and wish I could still be involved) :-). The issue seems to be that a moderately sized application takes a long time to start, right? How much of the user+sys time was user, how much was sys? Have you used python -v to see which modules it imports? Long ago I knew Hammie; I believe it reads a possibly large database. How much time does opening +closing the database take? (I presume that the 46 messages/second was not opening the database afresh for each message.) --Guido van Rossum (home page: http://www.python.org/~guido/)