[Spambayes] Proposing to remove 4 combining schemes

Tim Peters tim.one@comcast.net
Fri Oct 18 06:54:20 2002


I removed the 4 schemes in question.  The log msg is attached, as this
affected lots of code (mostly in an "it's gone" sense).  If anyone has a
real use for use_tim_combining, speak up, else I expect to drop that too (it
was really another attempt to get a better middle ground, but chi-combining
beats it for that).

Modified Files:
	Options.py README.txt TestDriver.py classifier.py
Removed Files:
	clgen.py clpik.py rmspik.py
Log Message:
Removed 4 combining schemes:

    use_central_limit
    use_central_limit2
    use_central_limit3
    use_z_combining

The central limit schemes aimed at getting a useful middle ground, but
chi-combining has proved to work better for that.  The chi scheme doesn't
require the troublesome "third training pass" either.  z-combining was
more like chi-combining, and worked well, but not as well as chi-
combining; z-combining proved vulnerable to "cancellation disease", to
which chi-combining seems all but immune.

Removed supporting option zscore_ratio_cutoff.

Removed various data attributes of class Bayes, unique to the central
limit schemes.  __getstate__ and __setstate__ had never been
updated to save or restore them, so old pickles will still work fine.

Removed method Bayes.compute_population_stats(), which constituted
"the third training pass" unique to the central limit schemes.  There's
scant chance this will ever be needed again, since it was never clear
how to make the 3-pass schemes practical over time.

Gave the still-default combining scheme's method the name gary_spamprob,
and made spamprob an alias for that by default.  This allows to name
each combining scheme explicitly in case you want to test using more
than one (the others are named tim_spamprob and chi2_spamprob).

In gary_spamprob, simplified the scaling of (P-Q)/(P+Q) into 0 .. 1,
replacing the whole shebang with P/(P+Q).  Same result, but a little
faster.

Removed files clgen.py, clpik.py, and rmspik.py.  These were data
generation and analysis tools unique to the central limit schemes.