[Spambayes] Adding a message database

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Wed Feb 26 06:25:28 EST 2003


2/25/2003 11:38:19 PM, "Mark Hammond" <mhammond at skippinet.com.au> wrote:

>I've been harping on about this for a while, and recently started seeing
>other people with the same need.  If I glanced the checkin message
>correctly, TimS also wants one for his Notes work.

Yes, I do maintain a pickle of message ids and how they have been trained.  It 
currently has four possible values: 'never classified', 'classified', 'spam', 
'ham'.  Spam and Ham values are set when a message is trained as such.  Never 
classified value is set upon first time initialization, due to some quirks in 
how Notes makes its mail database available to the outside world.  All of this 
enables proper (re)training.

>
>Currently, core spambayes maintains a database of wordinfos. I would like
>spambayes to assist in managing a database of message_ids, mapped to how
>they were previously trained.  While spambayes does not need any such
>concept to perform basic scoring, it seems that many applications using
>spambayes do.

I think this is a wonderful idea.

>2) Add the basic support to classifier, but in a non-intrusive way, allowing
>it to be left unused by an application.  I believe that modifying Classifier
>to use a "Message object" is too intrusive.

Better idea than strategy 1, IMO.

>
>Specifically, for (2), I would change learn to:

learn *could* be altered to manage unlearning as well.  This removes a 
headache for a lot of code.  Just learn-and-move-on.  Something to this 
effect:
>    def learn(self, wordstream, is_spam, msg_id = None):
>...
>        if msg_id is not None:
             trng = self._get_msgid(msg_id)
             if trng:
                 if trng == 'spam' and is_spam:
                     self.unlearn(wordstream, False)
                 elif trng == 'ham' and not is_spam:
                     self.unlearn(wordstream, True)
                 self._update_msgid(msg_id, is_spam)  # for crud purity <wink>
             else:
>                self._add_msgid(msg_id, is_spam)
>        self._add_msg(wordstream, is_spam)
>
>Comments?

Let us make it so!

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org





More information about the Spambayes mailing list