[Spambayes-checkins] spambayes classifier.py,1.30,1.31
Tim Peters
tim_one@users.sourceforge.net
Sat, 05 Oct 2002 14:30:58 -0700
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv28590
Modified Files:
classifier.py
Log Message:
_getclues(): There are no schemes remaining that benefit from a very
small options.max_discriminators, and the priority queue costs more than
it saves unless max_discriminators is small. So now we just save all
the clues, and sort them at the end, to find the strongest clues. This
is measurably faster at max_discriminators=30, and a stronger win the
larger max_discriminators gets.
Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/classifier.py,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** classifier.py 5 Oct 2002 07:18:04 -0000 1.30
--- classifier.py 5 Oct 2002 21:30:55 -0000 1.31
***************
*** 25,29 ****
import time
- from heapq import heapreplace
from sets import Set
--- 25,28 ----
***************
*** 179,184 ****
if evidence:
- clues.sort()
clues = [(w, p) for p, w, r in clues]
return prob, clues
else:
--- 178,183 ----
if evidence:
clues = [(w, p) for p, w, r in clues]
+ clues.sort(lambda a, b: cmp(a[1], b[1]))
return prob, clues
else:
***************
*** 347,355 ****
unknown = options.robinson_probability_x
! # A priority queue to remember the MAX_DISCRIMINATORS best
! # probabilities, where "best" means largest distance from 0.5.
! # The tuples are (distance, prob, word, record).
! nbest = [(-1.0, None, None, None)] * options.max_discriminators
! smallest_best = -1.0
wordinfoget = self.wordinfo.get
--- 346,351 ----
unknown = options.robinson_probability_x
! clues = [] # (distance, prob, word, record) tuples
! pushclue = clues.append
wordinfoget = self.wordinfo.get
***************
*** 363,372 ****
prob = record.spamprob
distance = abs(prob - 0.5)
! if distance >= mindist and distance > smallest_best:
! heapreplace(nbest, (distance, prob, word, record))
! smallest_best = nbest[0][0]
! # Return (prob, word, record) for the non-dummies.
! return [t[1:] for t in nbest if t[1] is not None]
#************************************************************************
--- 359,370 ----
prob = record.spamprob
distance = abs(prob - 0.5)
! if distance >= mindist:
! pushclue((distance, prob, word, record))
! clues.sort()
! if len(clues) > options.max_discriminators:
! del clues[0 : -options.max_discriminators]
! # Return (prob, word, record).
! return [t[1:] for t in clues]
#************************************************************************