[Spambayes] Re: [Python-Dev] Re: some preliminary timings
skip at pobox.com
Tue Feb 25 15:31:49 EST 2003
Guido> The issue seems to be that a moderately sized application takes a
Guido> long time to start, right? How much of the user+sys time was
Guido> user, how much was sys? Have you used python -v to see which
Guido> modules it imports?
Actually, hammmiefilter is a rather small application, so it seems its
runtime is completely dominated by startup costs. It reads a single message
on stdin, scores it and writes it out with one or two new headers on stdout.
(Perhaps the architecture is just wrong and we should be shipping it off to
a long-running process for scoring.) About one-third of user+sys is sys
Guido> Long ago I knew Hammie; I believe it reads a possibly large
Guido> database. How much time does opening +closing the database take?
Guido> (I presume that the 46 messages/second was not opening the
Guido> database afresh for each message.)
The database is now a bsddb hash file. It's no longer a pickle. As I
indicated, relatively few system calls (slightly less than 10% of the total)
seem to be involved in opening and probing the database.
It's probably somewhat invalid to equate number of system calls with
application runtime. I redumped my last ktrace file just now with
timestamps. Here are some computed intervals:
start -> open hammiefilter.pyc 0.071
open hammiefilter.pyc -> open hammie.db 0.516
open hammie.db -> close hammie.db 0.084
close hammie.db -> program end 0.011
The first interval is pure system startup - load interpreter executable,
link in shared libraries, etc. The second interval is application startup
- import modules, execute module-level code, etc. The third interval is
where the application actually does useful work. The last interval is
application shutdown. While application startup is not exclusively
importing modules, from the looks of things it's a fair chunk.
More information about the Spambayes