[Spambayes] spambayes-1.0a6 bug: sb_mboxtrain.py fails to mark mail data as X-Spambayes-Trained

Alan W. Irwin airwin at users.sourceforge.net
Sat Oct 11 11:23:12 EDT 2003


Symptoms:

I have chosen a two-message mbox folder called libtool as an example, but
I get the same result with larger folders as well.

irwin at starling> sb_mboxtrain.py -d ~/.spambayes/hammie.dbm -g ~/cdburn0/Mail/libtool
Training ham (/home/irwin/cdburn0/Mail/libtool):
  Reading as Unix mbox
  Trained 2 out of 2 messages
irwin at starling> sb_mboxtrain.py -d ~/.spambayes/hammie.dbm -g ~/cdburn0/Mail/libtool
Training ham (/home/irwin/cdburn0/Mail/libtool):
  Reading as Unix mbox
  Trained 2 out of 2 messages

Note the second time around it still trains 2 messages rather than 0.  Also,
that folder remains absolutely unchanged by these training efforts with
a september date

ls -l ~/cdburn0/Mail/libtool
-rw-------    1 irwin    irwin        4290 Sep 11 15:20 /home/irwin/cdburn0/Mail/libtool

and no extra mail header line referring to X-Spambayes-Trained

Configuration file:

cat ~/.spambayesrc
[Headers]
include_trained: True
[Storage]
persistent_storage_file: ~/.spambayes/hammie.dbm
persistent_use_database: True

The include_trained: True should be redundant (since it is default), but I
tried it anyway to force the X-Spambayes-Trained header to be in, but it
didn't work.

I believe this bug has serious consequences since there is no way to retrain
spambayes with the recommended cron tasks using sb_mboxtrain.py since it
acts as if the -f option was on all the time.  To users unaware of this
bug, the database gets slowly distorted by the cumulative repeating of
the same data with no correction of previous classification mistakes possible.

Of course, one workaround presumably (I haven't tried this yet) is to remove
your database in the cron task and start from scratch every time, but this
is somewhat wasteful of resources for the huge spam and ham collection of
mail folders I have collected.

If others here have trouble reproducing this bug, then here are some details
about my system:

I am running a Debian stable Linux distribution which I have modified by
downloading and installing the python 2.3.2 tarball, Python-2.3.2.tgz, from
python.org.

python
Python 2.3.2 (#1, Oct 10 2003, 17:38:20)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

I have downloaded and installed  spambayes-1.0a6.1.tar.gz from
sourceforge.net.

If there is any difficulty verifying this bug, I will be happy to supply
more details about my system, run more tests, etc., since spam waits for no
man, and it is fairly urgent I get it fixed.

Alan
__________________________
Alan W. Irwin
email: irwin at beluga.phys.uvic.ca
phone: 250-727-2902

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the PLplot scientific plotting software
package (plplot.org), the Yorick front-end to PLplot (yplot.sf.net), the
Loads of Linux Links project (loll.sf.net), and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________



More information about the Spambayes mailing list