[Spambayes] training on Linux

Fred Smith fredex at fcshome.stoneham.ma.us
Sun Feb 19 19:08:37 EST 2017


On Sun, Feb 19, 2017 at 04:47:04PM -0600, Skip Montanaro wrote:
> > I'm having sudden second thoughts about how I should be dealing with
> > mails that get qualified incorrectly and dropped into the wrong mailbox.
> >
> > I've operated for years on the understanding that if, e.g., a mail gets
> > dropped in spam (or unsure) that SHOULD have been sent to my inbox,
> > the fix is to manually move it from the spam (or unsure) box into my
> > inbox. I do this in mutt by selecting the misqualified mail and saving
> > it to the desired box (which removes it from where it had been put).
> >
> > but I've been reading stuff on the spambayes web site and don't find
> > any reference to doing it that way. In fact most of the discussion seems
> > to be how to do it in outlook or other non-Linux system.
> 
> I no longer use SpamBayes for my personal mail, but still (from
> time-to-time) update the training database on mail.python.org. For
> that, I use the train-to-exhaustion tool (contrib/tte.py in the
> SpamBayes repo). You said nothing about how asymmetric your ham and
> spam databases are, or how big either one is, so I'm not sure what
> properties your current database has. In general, I do try to keep
> them reasonably small and current.
> 
> I just got a new computer and can't currently login to
> mail.python.org, but when I do, I have a further shell script wrapper
> around tte.py I can send your way, and refresh my brain about the
> steps necessary to update things.
> 
> Skip

Skip, thanks for the note!

I think I've got a handle on it...

my hammiedb was several years old and had undoubtedly accumulated a lot
of cruft.

I was getting numerous non-spam mails diagnosed as "unsure", no matter
how many of them I moved into the HAM folder that gets trained every
night by a cron job.

so shortly after posting my query, I bit the bullet and moved the hammiedb
aside, moved the trained.ham and trained.spam folders aside and excerpted
the most recent 300 (or so) messages from ham and spam (yes, I reviewed
them to ensure there weren't any ringers in there) and trained on those.
now it seems to be behaving much better.

I also installed (in my .muttrc) some mutt macros linked to from the
spambayes page that train on messages as I move them to the ham or
spam folders, should any have been mis-diagnosed.

(trained.ham and trained.spam are folders where nothing is ever added
except when I move a mis-diagnosed mail, manually. these are the folders
that are used for training input during the cron job every night.)

Thanks again for the reply! and thanks for SpamBayes, it's STILL a 
great tool!

Fred

-- 
---- Fred Smith -- fredex at fcshome.stoneham.ma.us -----------------------------
   "For the word of God is living and active. Sharper than any double-edged 
   sword, it penetrates even to dividing soul and spirit, joints and marrow; 
              it judges the thoughts and attitudes of the heart."  
---------------------------- Hebrews 4:12 (niv) ------------------------------


More information about the SpamBayes mailing list