[Spambayes] Missing Junk E-mail folder

Tony Meyer tameyer at ihug.co.nz
Thu Nov 10 10:00:51 CET 2005


> However, I don't seem to have experienced any of the problems that
> others have found, so I'm happy about that!

That's because we've fixed all other problems <wink>.

> Firstly, people tend to delete the junk email folder, and this  
> kills SB
> cold.  Can it be tweaked a bit to just recreate the folder if it's
> missing, or does the design prevent this?  If it's theoretically
> possible then I can have a go, since it's my itch, but I don't want to
> setup the python 222 environment and so forth if it's known that this
> won't work.

There's no reason I can think of that this wouldn't work.  (I'm not  
sure what you mean by "python 222" - if you mean version 2.2.2 of  
Python, note that Python 2.2 (actually 2.2.0) is the *minimum*  
requirement, and the recommended version is Python 2.4.2).

If you have problems with this, feel free to ask for help - probably  
the spambayes-dev at python.org list would be a better place (a message  
with "missing junk e-mail folder" as the subject probably gets  
written off by many as someone having deleted their junk folder and  
needing to be pointed to the FAQ).

> Secondly, I deploy SB via a repackaged MSI, and this works well,  
> but I'd
> really like to send out a partially pretrained package.  Has anyone  
> done
> any work in this area before?

I believe Skip Montanaro considered this quite some time ago (looking  
into which messages should be included, etc), and I think that  
InBoxer, which was originally based on the SpamBayes Outlook plug-in,  
includes an optional pretrained database.  I'm not aware of any others.

Really, I would advise against this.  It only takes a very small  
number (less than a day's email for most users) for SpamBayes to  
become effective.  By doing all the training yourself, you're (a)  
learning how to do ongoing training, which is important, and (b)  
learning *your* email stream, not a general one.  (It's possible that  
(b) isn't as much of a problem in your case, since corporate (or  
government) email might all look alike (from a statistical point of  
view) anyway).

If you do want to do it, the only hard job is picking which email to  
use.  Once you've done that, put it in Ham/Spam folders, and train  
away.  Then just include the default_bayes_database.db and  
default_message_database.db in the installation, putting them into  
the user's data directory.

If you want the training to be as effective as possible, you might  
want to switch off some of the tokenizing options to avoid getting  
spurious clues that relate specifically to whoever actually received  
the email.  The comments in tokenizer.py outline which ones should be  
avoided ("mixed corpora" is the phrase to watch out for).  Ask if you  
want more help with this.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.




More information about the SpamBayes mailing list