[Spambayes] Initial training strategy for Outlook
Mark Hammond
mhammond at skippinet.com.au
Mon Aug 11 11:08:07 EDT 2003
In another thread, Tim and I pontificate on initial training strategies.
I'd like feedback from real users too.
[Tim]
> [Mark Hammond]
> > Everytime this comes up, someone points out that the many clues in
> > such a pre-built database would be subtly different - eg, the "to"
> > address, the real name, the name of your ISP in the headers, etc.
>
> Header tokens could be purged, of course. As we recently
> discovered, the
> Outlook addin does well without the headers <wink>.
Actually, I made a mistake in my announce mail - no binary was ever released
with that behaviour. What threw me was the bug report - I assumed it first
came up in 006, but the user was always running from source code. So my
apologies to everyone I scared with our "major bug" that wasn't :)
So it does fairly well, but a lack of complaints from users about the
effectiveness is unfortunately misleading.
> > Other options would be:
> > * A "wizard" style interface for initial training.
> > * Auto-enabling SpamBayes once it is trained and configured.
>
> * Don't sweat it. With no training data, all msgs will score
> 0.50 and so end up in the Unsure folder. It's easy to explain
> why then, and the cure should be pretty obvious to the end
> user (i.e., no training == no knowledge, so start by categorizing
> all the stuff in the Unsure folder). While I'm sure it varies
> by user, I enjoy blowing away my training data and starting
> over from scratch -- it's fun to watch it learn!
I guess this makes sense. I wonder if we could make it work well for
everyone - particularly our users who are confused by the existing system?
I'd like some feedback from others on this list - how could we make the
initial "training" process simpler? Would Tim's idea of allowing *no*
training data, and forcing the user to recover almost everything from
"unsure" work OK?
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2764 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030811/79f4842b/winmail.bin
More information about the Spambayes
mailing list