[spambayes-dev] sb_bnfilter performance

Toby Dickenson tdickenson at geminidataloggers.com
Tue May 18 03:57:08 EDT 2004


On Wednesday 05 May 2004 23:21, Tony Meyer wrote:

> Out of curiosity, have you profiled sb_bnserver at all?

Ive spent a little time on this, testing using sb_bnfilter to filter my whole 
inbox and spam folder with psycho turned off.

The profiler showed one hot lambda function in the tokeniser, eliminated in 
the patch below. Repeating the test without the profiler showed only a few 
percent increase in speed. Unless there are objections I will commit this 
change anyway; to my eyes it is also a small readability improvement.

After that, most of the time is going in bsddb. We currently call shelve.get 
once for each token; which calls both bsddb.__getattr__ *and* .has_attr. 
Hacking shelve.py to replace the has_attr call with a KeyError exception 
handler gave roughly a 10% gain. Nice, but not enough to tempt me to polish 
any changes.

Profiler output attached.

-- Index: spambayes/classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v
retrieving revision 1.23
diff -c -2 -r1.23 classifier.py
*** spambayes/classifier.py     6 Feb 2004 21:43:00 -0000       1.23
--- spambayes/classifier.py     18 May 2004 05:43:10 -0000
***************
*** 221,226 ****

          if evidence:
!             clues = [(w, p) for p, w, r in clues]
!             clues.sort(lambda a, b: cmp(a[1], b[1]))
              clues.insert(0, ('*S*', S))
              clues.insert(0, ('*H*', H))
--- 221,226 ----

          if evidence:
!             clues.sort()
!             clues = [(w,p) for (p,w,r) in clues]
              clues.insert(0, ('*S*', S))
              clues.insert(0, ('*H*', H))

Toby Dickenson
-------------- next part --------------
         5222854 function calls (5185915 primitive calls) in 116.128 CPU seconds

   Ordered by: internal time, call count
   List reduced from 245 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    59953   12.636    0.000   12.720    0.000 /usr/lib/python2.3/bsddb/__init__.py:140(has_key)
    59835    8.378    0.000    9.464    0.000 /usr/lib/python2.3/bsddb/__init__.py:114(__getitem__)
      931    6.478    0.007    6.533    0.007 ../scripts/sb_bnserver.py:135(get_request)
     3797    6.272    0.002   21.605    0.006 /usr/lib/python2.3/email/Generator.py:162(_write_headers)
   231478    5.789    0.000   31.549    0.000 /home/toby/projects/spambayes/spambayes/storage.py:253(_wordinfoget)
     1128    5.247    0.005    8.652    0.008 /usr/lib/python2.3/email/Generator.py:362(_make_boundary)
   400632    3.746    0.000    8.526    0.000 /home/toby/projects/spambayes/spambayes/tokenizer.py:1532(tokenize_body)
13644/13454    3.586    0.000    3.750    0.000 /usr/lib/python2.3/email/Header.py:419(_split_ascii)
      930    3.528    0.004   51.244    0.055 /home/toby/projects/spambayes/spambayes/classifier.py:430(_getclues)
      930    3.449    0.004    3.449    0.004 /usr/lib/python2.3/socket.py:161(close)
      930    3.181    0.003    7.975    0.009 /home/toby/projects/spambayes/spambayes/hammie.py:40(formatclues)
   181183    3.089    0.000    4.111    0.000 /home/toby/projects/spambayes/spambayes/OptionsClass.py:597(get)
6631/2823    2.699    0.000    3.923    0.001 /usr/lib/python2.3/sre_parse.py:367(_parse)
   231478    2.516    0.000   34.831    0.000 /home/toby/projects/spambayes/spambayes/classifier.py:504(_worddistanceget)
    59835    2.508    0.000   11.972    0.000 /usr/lib/python2.3/shelve.py:114(__getitem__)
     2611    2.433    0.001    2.449    0.001 /usr/lib/python2.3/email/Generator.py:191(_handle_text)
 1880/930    2.019    0.001    6.066    0.007 /usr/lib/python2.3/email/Parser.py:143(_parsebody)
    59996    1.830    0.000    1.830    0.000 /usr/lib/python2.3/email/Generator.py:41(_is8bitstring)
     1239    1.761    0.001    1.761    0.001 /home/toby/projects/spambayes/spambayes/tokenizer.py:1175(find_html_virus_clues)
    59976    1.748    0.000    2.154    0.000 /usr/lib/python2.3/email/Header.py:344(_encode_chunks)



More information about the spambayes-dev mailing list