[Spambayes] How many statistics are considered as good?

Tony Meyer tameyer at ihug.co.nz
Fri Apr 8 06:00:40 CEST 2005


>      There are 14,209 spam. 
>      There are 8,623  good messages. 

That is a fairly large database.  If you're getting results that you're
happy with, then don't worry about changing it.  If you think the results
should be better, then it might be worth training from scratch.  Good
results can generally be had from only training on any mistakes (ham
classified as spam, spam classified as ham, and unsures).

If you go to the web interface Configuration page and change the name of the
token database (probably hammie.db at the moment) then that will start
training anew, leaving the old database in place in case you decide to go
back to it.

> When I look in the C:\Program Files\SpamBayes\bin there are 10# 
> folders; 3# .dll; 6# .exe; and one called   sb_tray.exe.log . 
> I've had a look at that last file but I am none the wiser. 

No user data is stored in that directory, it's all stored under the Windows
Application Data directory.  With XP this is usually 'C:\Documents and
Setings\{username}\Application Data\Spambayes\Proxy'.

> I believe that these both the ham and the spam or at least the 
> spam(?) should be reduced in size; that the size of these 
> files may be slowing things down.

A large database might slow things down, yes.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.



More information about the Spambayes mailing list