Guido> The issue seems to be that a moderately sized application takes a Guido> long time to start, right? How much of the user+sys time was Guido> user, how much was sys? Have you used python -v to see which Guido> modules it imports?
Actually, hammmiefilter is a rather small application, so it seems its runtime is completely dominated by startup costs. It reads a single message on stdin, scores it and writes it out with one or two new headers on stdout. (Perhaps the architecture is just wrong and we should be shipping it off to a long-running process for scoring.) About one-third of user+sys is sys time.
Guido> Long ago I knew Hammie; I believe it reads a possibly large Guido> database. How much time does opening +closing the database take? Guido> (I presume that the 46 messages/second was not opening the Guido> database afresh for each message.)
The database is now a bsddb hash file. It's no longer a pickle. As I indicated, relatively few system calls (slightly less than 10% of the total) seem to be involved in opening and probing the database.
It's probably somewhat invalid to equate number of system calls with application runtime. I redumped my last ktrace file just now with timestamps. Here are some computed intervals:
interval time -------- ---- start -> open hammiefilter.pyc 0.071 open hammiefilter.pyc -> open hammie.db 0.516 open hammie.db -> close hammie.db 0.084 close hammie.db -> program end 0.011
The first interval is pure system startup - load interpreter executable, link in shared libraries, etc. The second interval is application startup - import modules, execute module-level code, etc. The third interval is where the application actually does useful work. The last interval is application shutdown. While application startup is not exclusively importing modules, from the looks of things it's a fair chunk.