[Spambayes] Re: [Python-Dev] Re: some preliminary timings

Skip Montanaro skip@pobox.com
Tue, 25 Feb 2003 16:52:01 -0600


    Alex> This is good info.  Can you add in the time intervals between
    Alex> loading each of the modules?  That might point out which modules
    Alex> are actually expensive (or if it's none in particular).

Okay, here's a bit more information.  I instrumented hammiefilter.py with
code like

    marker = 0
    import os
    file("os%d"%marker,"w"); os.unlink("os%d"%marker); marker+=1
    import sys
    file("sys%d"%marker,"w"); os.unlink("sys%d"%marker); marker+=1
    import getopt
    file("getopt%d"%marker,"w"); os.unlink("getopt%d"%marker); marker+=1
    from spambayes import hammie, Options, mboxutils
    file("hammie%d"%marker,"w"); os.unlink("hammie%d"%marker); marker+=1

then scored a single message under ktrace control and dumped the ktrace data
with timestamps.  (This could just have easily have been done with
time.clock() or time.time() calls, but after awhile of staring at ktrace
results, this seemed just as easy.)

The instrumentation gave me a larger number of smaller intervals with these
meanings:

    interval                                    time
    start through first import (os)             0.166
    import sys                                  < 0.001
    import getopt                               0.055
    import hammie, Options, mboxutils           0.660 (!!!)
    to start of HammieFilter class defn         < 0.001
    to start of main()                          < 0.001
    create HammieFilter instance                0.005
    parse cmd line options                      < 0.001
    get msg from stdin                          0.006
    score msg                                   0.224
    write scored msg to stdout                  0.002

Focusing on the hammie-related imports, I split that import into three
lines, reinstrumented and ran it again.  Those individual imports then
expanded to

    import hammie                               0.340
    import Options                              < 0.001
    import mboxutils                            < 0.001

(As you can see, the times are only relative (large vs small) and don't seem
to be all that reproducible across individual runs.)

One more marker insertion pass, this time in hammie.py, yielded these
intervals from that file:

    import mboxutils                            0.215
    import storage                              0.072
    import options                              < 0.001
    import tokenize                             0.052
    define Hammie class                         < 0.001
    define open function                        < 0.001

It appears something in the mboxutils import is the culprit.  I'm about to
go home for the day though, so I'll let others pick up from there.

Skip