[Python-checkins] python/nondist/sandbox/spambayes classifier.py,1.10,1.11

tim_one@users.sourceforge.net tim_one@users.sourceforge.net
Wed, 04 Sep 2002 22:37:14 -0700


Update of /cvsroot/python/python/nondist/sandbox/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv13543

Modified Files:
	classifier.py 
Log Message:
Added note about MINCOUNT oddities.


Index: classifier.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/spambayes/classifier.py,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** classifier.py	5 Sep 2002 01:51:18 -0000	1.10
--- classifier.py	5 Sep 2002 05:37:12 -0000	1.11
***************
*** 57,60 ****
--- 57,70 ----
  # (In addition, the count compared is after multiplying it with the
  # appropriate bias factor.)
+ #
+ # XXX Reducing this to 1.0 (effectively not using it at all then) seemed to
+ # XXX give a sharp reduction in the f-n rate in a partial test run, while
+ # XXX adding a few mysterious f-ps.  Then boosting it to 2.0 appeared to
+ # XXX give an increase in the f-n rate in a partial test run.  This needs
+ # XXX deeper investigation.  Might also be good to develop a more general
+ # XXX concept of confidence:  MINCOUNT is a gross gimmick in that direction,
+ # XXX effectively saying we have no confidence in probabilities computed
+ # XXX from fewer than MINCOUNT instances, but unbounded confidence in
+ # XXX probabilities computed from at least MINCOUNT instances.
  MINCOUNT = 5.0