[Spambayes-checkins] spambayes classifier.py,1.30,1.31

Tim Peters tim_one@users.sourceforge.net
Sat, 05 Oct 2002 14:30:58 -0700


Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv28590

Modified Files:
	classifier.py 
Log Message:
_getclues():  There are no schemes remaining that benefit from a very
small options.max_discriminators, and the priority queue costs more than
it saves unless max_discriminators is small.  So now we just save all
the clues, and sort them at the end, to find the strongest clues.  This
is measurably faster at max_discriminators=30, and a stronger win the
larger max_discriminators gets.


Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/classifier.py,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** classifier.py	5 Oct 2002 07:18:04 -0000	1.30
--- classifier.py	5 Oct 2002 21:30:55 -0000	1.31
***************
*** 25,29 ****
  
  import time
- from heapq import heapreplace
  from sets import Set
  
--- 25,28 ----
***************
*** 179,184 ****
  
          if evidence:
-             clues.sort()
              clues = [(w, p) for p, w, r in clues]
              return prob, clues
          else:
--- 178,183 ----
  
          if evidence:
              clues = [(w, p) for p, w, r in clues]
+             clues.sort(lambda a, b: cmp(a[1], b[1]))
              return prob, clues
          else:
***************
*** 347,355 ****
          unknown = options.robinson_probability_x
  
!         # A priority queue to remember the MAX_DISCRIMINATORS best
!         # probabilities, where "best" means largest distance from 0.5.
!         # The tuples are (distance, prob, word, record).
!         nbest = [(-1.0, None, None, None)] * options.max_discriminators
!         smallest_best = -1.0
  
          wordinfoget = self.wordinfo.get
--- 346,351 ----
          unknown = options.robinson_probability_x
  
!         clues = []  # (distance, prob, word, record) tuples
!         pushclue = clues.append
  
          wordinfoget = self.wordinfo.get
***************
*** 363,372 ****
                  prob = record.spamprob
              distance = abs(prob - 0.5)
!             if distance >= mindist and distance > smallest_best:
!                 heapreplace(nbest, (distance, prob, word, record))
!                 smallest_best = nbest[0][0]
  
!         # Return (prob, word, record) for the non-dummies.
!         return [t[1:] for t in nbest if t[1] is not None]
  
      #************************************************************************
--- 359,370 ----
                  prob = record.spamprob
              distance = abs(prob - 0.5)
!             if distance >= mindist:
!                 pushclue((distance, prob, word, record))
  
!         clues.sort()
!         if len(clues) > options.max_discriminators:
!             del clues[0 : -options.max_discriminators]
!         # Return (prob, word, record).
!         return [t[1:] for t in clues]
  
      #************************************************************************