[Spambayes] Initial training strategy for Outlook

Mark Hammond mhammond at skippinet.com.au
Mon Aug 11 11:08:07 EDT 2003


In another thread, Tim and I pontificate on initial training strategies.
I'd like feedback from real users too.

[Tim]
> [Mark Hammond]
> > Everytime this comes up, someone points out that the many clues in
> > such a pre-built database would be subtly different - eg, the "to"
> > address, the real name, the name of your ISP in the headers, etc.
> 
> Header tokens could be purged, of course.  As we recently 
> discovered, the
> Outlook addin does well without the headers <wink>.

Actually, I made a mistake in my announce mail - no binary was ever released
with that behaviour.  What threw me was the bug report - I assumed it first
came up in 006, but the user was always running from source code.  So my
apologies to everyone I scared with our "major bug" that wasn't :)

So it does fairly well, but a lack of complaints from users about the
effectiveness is unfortunately misleading.

> > Other options would be:
> > * A "wizard" style interface for initial training.
> > * Auto-enabling SpamBayes once it is trained and configured.
> 
> * Don't sweat it.  With no training data, all msgs will score
>   0.50 and so end up in the Unsure folder.  It's easy to explain
>   why then, and the cure should be pretty obvious to the end
>   user (i.e., no training == no knowledge, so start by categorizing
>   all the stuff in the Unsure folder).  While I'm sure it varies
>   by user, I enjoy blowing away my training data and starting
>   over from scratch -- it's fun to watch it learn!

I guess this makes sense.  I wonder if we could make it work well for
everyone - particularly our users who are confused by the existing system?

I'd like some feedback from others on this list - how could we make the
initial "training" process simpler?  Would Tim's idea of allowing *no*
training data, and forcing the user to recover almost everything from
"unsure" work OK?

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2764 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030811/79f4842b/winmail.bin


More information about the Spambayes mailing list