[Spambayes] RE: Help! Imapfilter and mysql/pickle woes
Christopher.Woo at pepperdine.edu
Wed Jan 19 06:08:05 CET 2005
Thanks for the reply Tony. For some reason, I am unable to send to the
Spambayes email list, or at the very least, I'm not seeing my emails show up
in the digests. I tried running the CVS version of sb_imapfilter.py but
didn't get very far:
Traceback (most recent call last):
File "C:\SpamBayes\scripts\sb_imapfilter.py", line 103, in ?
from spambayes.Version import get_current_version
ImportError: cannot import name get_current_version
Not quite sure what I'm doing wrong. Is there a place where I can download a
complete set of the files in an archive?
From: Tony Meyer [mailto:tameyer at ihug.co.nz]
Sent: Tuesday, January 18, 2005 4:26 PM
To: Woo, Christopher; spambayes at python.org
Subject: RE: Help! Imapfilter and mysql/pickle woes
> 1) Using a pickle dbm with sb_imapfilter.py is regularly resulting in
> a corrupt database within days of wiping it out and starting over. I
> can get about a week out of the database before it corrupts and fails
> with an assertion error.
This is 1.0.1 sb_imapfilter, yes? It would be worth giving CVS
sb_imapfilter a go - it should be vastly improved. I've tried to copy most
bugfixes over to the 1.0.x branch, but that's not been possible when there
are large changes.
I also heard today that using Python 2.4 helps, which I suspect means there
is a problem handling malformed messages. If using Python 2.4 is easy to
do, then it would be worth doing.
> 2) I've been trying to get the mysql option to work for
> sb_imapfilter.py on and off for a couple months, but I am still stuck:
> First off, regardless of what iteration I try, I cannot seem to
> specify any DSN other than the default. When I try to specify a custom
> DSN, something happens in the code when it parses the values so that
> the user field is blank, so that the result is user '@localhost' tries
> to log onto mysql without success.
I believe this is caused by a known bug. It's fixed in CVS for 1.1, but
hasn't been backported. If you like I can do so, so that the fix is in
1.0.2. I believe you can work around it by putting a space at the start of
> Upon giving credentials to the default DSN used by the script, I can
> actually get sb_imapfilter.py to train on a sample of spam and ham
> successfully, but immediately afterwards, when I try to actually run
> sb_imapfilter.py to filter my inbox, it fails with the dreaded "Token
> seen in more spam than spam trained."
> assertion error:
If you have the patience, try doing this:
0. Clear the ham & spam training folders.
1. Put one (more) message in each of the ham and spam training folder.
2. Run sb_imapfilter.py -t.
3. Do a 'select * from spambayes where word="saved_state"' query against
the database, and check that the values are the same as the number of
messages in the folders (i.e. 1,1, then 2,2, then 3,3, ...).
4. Repeat from 1.
It would help to know if it dies out quickly (like with a single message) or
not. If you get to high numbers and it's still working, try adding multiple
messages at a time, and see if the counts still match.
I assume that sb_imapfilter always finishes without error, and isn't
interrupted while training?
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
More information about the Spambayes