[Python-Dev] Re: some preliminary timings
Guido van Rossum
guido@python.org
Tue, 25 Feb 2003 12:18:09 -0500
> Executive summary for python-dev folks seeing this for the first time:
>
> This thread started at
>
> http://mail.python.org/pipermail/spambayes/2003-February/003520.html
>
> Running in a single interpreter loop, I can score roughly 46
> messages per second. Running from the shell using hammiefilter.py
> (which takes a msg on stdin and spits a scored message to stdout)
> performance drops to roughly 2 messages per second. Neil
> Schmenenauer noted all the failed open() calls during import
> lookup, which got me started trying to whittle them down.
>
> Two more things to try before abandoning this quixotic adventure...
>
> It appears $prefix/python23.zip is left in sys.path even if it doesn't exist
> (Just van Rossum explained to me in a bug report I filed that nonexistent
> directories might actually be URLs or other weird hacks which import hooks
> could make use of), so I went with the flow and created it, populating it
> with the contents of $prefix/python2.3. My averate wallclock time went from
> 0.5 seconds to 0.47 seconds and user+sys times went from 0.43 seconds to
> 0.41 seconds. A modest improvement.
>
> One more little tweak. I moved the lib-dynload directory to the front of
> sys.path (obviously only safe if nothing there appears earlier in sys.path).
> Wall clock average stayed at 0.47 seconds and user+sys at 0.41 seconds,
> though the total number of system calls as measured by ktrace went from 3454
> to 3042.
>
> Hammiefilter itself really does very little. Looking at the last
> ktrace/kdump output, I see 3042 system calls. The hammie.db file isn't
> opened until line 2717. All the rest before that is startup stuff, the
> largest chunk of which are nami operations (731) and open (557) calls, most
> of them involving nonexistent files (as evidenced by seeing only 164 calls
> to close()). In contrast, only 278 system calls appear to be directly
> related to manipulating the hammie database.
>
> This is still somewhat off-topic for this list (except for the fact that my
> intention was to get hammiefilter to run faster), so I'll cc python-dev to
> keep Tim happy, and perhaps mildly irritate Guido by discussing specific
> apps on python-dev.
Far from it, I wish spambayes well (and wish I could still be
involved) :-).
The issue seems to be that a moderately sized application takes a long
time to start, right? How much of the user+sys time was user, how
much was sys? Have you used python -v to see which modules it
imports?
Long ago I knew Hammie; I believe it reads a possibly large database.
How much time does opening +closing the database take? (I presume
that the 46 messages/second was not opening the database afresh for
each message.)
--Guido van Rossum (home page: http://www.python.org/~guido/)