Re: [Spambayes] Re: [Python-Dev] Re: some preliminary timings

In message: <15963.57541.244513.169979@montanaro.dyndns.org> Skip Montanaro <skip@pobox.com> writes:
It's probably somewhat invalid to equate number of system calls with application runtime. I redumped my last ktrace file just now with timestamps. Here are some computed intervals:
interval time -------- ---- start -> open hammiefilter.pyc 0.071 open hammiefilter.pyc -> open hammie.db 0.516 open hammie.db -> close hammie.db 0.084 close hammie.db -> program end 0.011
This is good info. Can you add in the time intervals between loading each of the modules? That might point out which modules are actually expensive (or if it's none in particular). - Alex

>> interval time >> -------- ---- >> start -> open hammiefilter.pyc 0.071 >> open hammiefilter.pyc -> open hammie.db 0.516 >> open hammie.db -> close hammie.db 0.084 >> close hammie.db -> program end 0.011 Alex> This is good info. Can you add in the time intervals between Alex> loading each of the modules? That might point out which modules Alex> are actually expensive (or if it's none in particular). That would be a bit tedious to do manually for the dozens of modules which are loaded. I'll see what I can come up with though. Skip

Alex> This is good info. Can you add in the time intervals between Alex> loading each of the modules? That might point out which modules Alex> are actually expensive (or if it's none in particular). Okay, here's a bit more information. I instrumented hammiefilter.py with code like marker = 0 import os file("os%d"%marker,"w"); os.unlink("os%d"%marker); marker+=1 import sys file("sys%d"%marker,"w"); os.unlink("sys%d"%marker); marker+=1 import getopt file("getopt%d"%marker,"w"); os.unlink("getopt%d"%marker); marker+=1 from spambayes import hammie, Options, mboxutils file("hammie%d"%marker,"w"); os.unlink("hammie%d"%marker); marker+=1 then scored a single message under ktrace control and dumped the ktrace data with timestamps. (This could just have easily have been done with time.clock() or time.time() calls, but after awhile of staring at ktrace results, this seemed just as easy.) The instrumentation gave me a larger number of smaller intervals with these meanings: interval time start through first import (os) 0.166 import sys < 0.001 import getopt 0.055 import hammie, Options, mboxutils 0.660 (!!!) to start of HammieFilter class defn < 0.001 to start of main() < 0.001 create HammieFilter instance 0.005 parse cmd line options < 0.001 get msg from stdin 0.006 score msg 0.224 write scored msg to stdout 0.002 Focusing on the hammie-related imports, I split that import into three lines, reinstrumented and ran it again. Those individual imports then expanded to import hammie 0.340 import Options < 0.001 import mboxutils < 0.001 (As you can see, the times are only relative (large vs small) and don't seem to be all that reproducible across individual runs.) One more marker insertion pass, this time in hammie.py, yielded these intervals from that file: import mboxutils 0.215 import storage 0.072 import options < 0.001 import tokenize 0.052 define Hammie class < 0.001 define open function < 0.001 It appears something in the mboxutils import is the culprit. I'm about to go home for the day though, so I'll let others pick up from there. Skip

Note that spambayes/mboxutils.py imports email.Message, which effectively imports the entire email package. That's a lot of code (one file per class). --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (3)
-
Guido van Rossum
-
Skip Montanaro
-
T. Alexander Popiel