[spambayes-dev] Spontaneous training in Outlook addin?

Tim Peters tim.one at comcast.net
Sun Jul 27 19:46:20 EDT 2003


I recently switched from my trusty dict-based classfier to a bsddb3 one,
still using Python 2.2.3, though.

One thing I've noticed, both at work and at home, is that the PythonWin
Trace Collector window occasionally shows an instance of "spontaneous
training":  a single msg that's never been moved (still sitting in my inbox
or Unsure folder) gets trained as Ham or Spam "for no reason at all".

For example,

SpamBayes Outlook Addin (beta), version 0.4 (July 2003) starting (with
engine SpamBayes Beta2, version 0.2 (July 2003))
on Windows 4.10.67766446 ( A )
using Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)]
SpamBayes: Watching for new messages in folder  Inbox
SpamBayes: Watching for new messages in folder  Spam
Processing 0 missed spam in folder 'Inbox' took 18.299ms
Message '[Python-bugs-list] [ python-Bugs-746895 ] socket.sendto(SOCK_DGRAM)
very slow on OSX!' had a Spam classification of 'No'
Message 'Checking whether bool is a type' had a Spam classification of 'No'
Message 'Re: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question
(orpossibly a bug report)' had a Spam classification of 'No'
Message 'Re: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question
(orpossibly a bug report)' had a Spam classification of 'No'
Message 'Re: changing the List's behaviour?' had a Spam classification of
'No'
Message 'Re: Checking whether bool is a type' had a Spam classification of
'No'

Pay attention to the next one:

Message 'Re: [Python] Re: [OT] On the TimBot' had a Spam classification of
'No'

Message 'Re: Checking whether bool is a type' had a Spam classification of
'No'
Message '[C++-sig] Re: args patch' had a Spam classification of 'No'
Message '¨È¼ö®ö¨t¦C¦@50³¡vcd,¶W­È½æ' had a Spam classification of 'Yes'
Message 'Anima1 Perversion' had a Spam classification of 'Yes'

And then:

Training on message 'Re: [Python] Re: [OT] On the TimBot' -  trained as good
Saving bayes database with 771 spam and 356 good messages
 -> C:\WINDOWS\Application Data\SpamBayes\default_bayes_database.db
 -> C:\WINDOWS\Application Data\SpamBayes\default_message_database.db
Saved databases in 140.547ms


This is curious for two reasons:  (1) I never told spambayes to do anything
with the TimBot message; and, (2) That message is old!  All the other
messages it's reporting on did arrive in this Outlook session, but the
TimBot message it decided to train on arrived days ago.

Something has gone crazy here, but I never saw this before and haven't seen
any other reports of it.  Anyone else?  My first suspicion was that we're
doing something wrong in the bsddb3 version of the message id database, but
that wouldn't (AFAIK) explain spontaneous training.


Another oddity I never saw when using a pickled dict:  I asked the addin to
rebuild the database from scratch.  This gave:

"""
Checked 357 in folder Ham - 354 new entries found.
Checked 771 in folder Spam - 771 new entries found.
Saving bayes database with 771 spam and 354 good messages
...
"""

There are in fact 357 msgs in my ham training folder (I have only one, off
in a separate .pst file holding my ham and spam training data).  Why would
it think only 354 of them are new?  Maybe that also casts suspicion on how
we're keeping track of messages.

I'm also disturbed that the 'Animal Perversion' msg got rated as spam,
although I probably shouldn't admit that <wink>.




More information about the spambayes-dev mailing list