[Spambayes] Help! Imapfilter and mysql/pickle woes

Woo, Christopher Christopher.Woo at pepperdine.edu
Sat Jan 15 02:21:53 CET 2005


Well, my problem is two-fold:

1) Using a pickle dbm with sb_imapfilter.py is regularly resulting in a
corrupt database within days of wiping it out and starting over. I can get
about a week out of the database before it corrupts and fails with an
assertion error.

2) I've been trying to get the mysql option to work for sb_imapfilter.py on
and off for a couple months, but I am still stuck:

First off, regardless of what iteration I try, I cannot seem to specify any
DSN other than the default. When I try to specify a custom DSN, something
happens in the code when it parses the values so that the user field is
blank, so that the result is user '@localhost' tries to log onto mysql
without success. Upon giving credentials to the default DSN used by the
script, I can actually get sb_imapfilter.py to train on a sample of spam and
ham successfully, but immediately afterwards, when I try to actually run
sb_imapfilter.py to filter my inbox, it fails with the dreaded "Token seen
in more spam than spam trained." assertion error:

  File "C:\Python23\lib\site-packages\spambayes\classifier.py", line 311, in
probability
    assert spamcount <= nspam, "Token seen in more spam than spam trained."
AssertionError: Token seen in more spam than spam trained.

I'm only training Spambayes on a small sample of spam and ham, maybe 40/15.
I've tried all possible combinations of wiping out spambayes.messageinfo.db,
the mysql tables, etc. but I cannot seem to get away from this error.

Now, if someone can just tell me to stop banging my head on this mysql wall
and go back to pickle or dbm, I will do just that, but it doesn't fix my
original problem which is that my database just doesn't remain viable long
enough.

Help!!

Chris



More information about the Spambayes mailing list