[Python-Dev] Re: some preliminary timings

Guido van Rossum guido@python.org
Tue, 25 Feb 2003 12:18:09 -0500


> Executive summary for python-dev folks seeing this for the first time:
> 
>     This thread started at
> 
>         http://mail.python.org/pipermail/spambayes/2003-February/003520.html
> 
>     Running in a single interpreter loop, I can score roughly 46
>     messages per second.  Running from the shell using hammiefilter.py
>     (which takes a msg on stdin and spits a scored message to stdout)
>     performance drops to roughly 2 messages per second.  Neil
>     Schmenenauer noted all the failed open() calls during import
>     lookup, which got me started trying to whittle them down.
> 
> Two more things to try before abandoning this quixotic adventure...
> 
> It appears $prefix/python23.zip is left in sys.path even if it doesn't exist
> (Just van Rossum explained to me in a bug report I filed that nonexistent
> directories might actually be URLs or other weird hacks which import hooks
> could make use of), so I went with the flow and created it, populating it
> with the contents of $prefix/python2.3.  My averate wallclock time went from
> 0.5 seconds to 0.47 seconds and user+sys times went from 0.43 seconds to
> 0.41 seconds.  A modest improvement.
> 
> One more little tweak.  I moved the lib-dynload directory to the front of
> sys.path (obviously only safe if nothing there appears earlier in sys.path).
> Wall clock average stayed at 0.47 seconds and user+sys at 0.41 seconds,
> though the total number of system calls as measured by ktrace went from 3454
> to 3042.
> 
> Hammiefilter itself really does very little.  Looking at the last
> ktrace/kdump output, I see 3042 system calls.  The hammie.db file isn't
> opened until line 2717.  All the rest before that is startup stuff, the
> largest chunk of which are nami operations (731) and open (557) calls, most
> of them involving nonexistent files (as evidenced by seeing only 164 calls
> to close()).  In contrast, only 278 system calls appear to be directly
> related to manipulating the hammie database.
> 
> This is still somewhat off-topic for this list (except for the fact that my
> intention was to get hammiefilter to run faster), so I'll cc python-dev to
> keep Tim happy, and perhaps mildly irritate Guido by discussing specific
> apps on python-dev.

Far from it, I wish spambayes well (and wish I could still be
involved) :-).

The issue seems to be that a moderately sized application takes a long
time to start, right?  How much of the user+sys time was user, how
much was sys?  Have you used python -v to see which modules it
imports?

Long ago I knew Hammie; I believe it reads a possibly large database.
How much time does opening +closing the database take?  (I presume
that the 46 messages/second was not opening the database afresh for
each message.)

--Guido van Rossum (home page: http://www.python.org/~guido/)