[spambayes-dev] Very small change for composite word tokenizing

Tim Peters tim.one at comcast.net
Wed Aug 6 23:56:58 EDT 2003


[Tony Meyer]
> BTW, if anyone is wondering, the two false positives that are in
> all of my results are:
>
>  o A message from mtnsms.com telling me about the (then) new
>    smspop service.  This remains my only email from mtnsms.com,
>    and it does look very spammy, but was definitely ham (in fact,
>    I now use smspop).  It scored 0.950.
>
>  o A message from a company doing a survey about how a transaction
>    with another company was.  Again, looked a lot like spam, but
>    wasn't, and again the only message from this company that I've
>    received (AFAIK).  It scored 0.966.
>
> I was expecting the second, so could retrieve it and this wasn't a
> problem.  The first one was a real FP, though.  Any suggestions on
> options I could fiddle to correct this (without creating lots of
> other incorrect classified messages) would be of interest! :)

I'm afraid there aren't any.  The system has no intelligence whatsoever, it
just counts tokens.  It works so good 99.99% of the time it's easy to
believe it must be smarter than that <0.9 wink>; but it isn't.  In the first
case, I take it you didn't also discuss sms in other ham.  Those are the
nasty FP for me:  dealing with an online business in an area (product or
service) I've never emailed about before.  For example, I'm a smoker and buy
my cigarettes over the web, but the only clue about that you'll find in my
ham training set is seven(!) msgs from my death vendor -- it took that many
to convince spambayes I really wanted those (but didn't want spam trying to
sell me cigars).

It's often the case that training on just one will knock the next into
Unsure territory, but before the first you're stuck.  Whitelists don't help
this either, since you can't whitelist an address for a msg (like your
first) you're not expecting.  Another option is to hire a personal assistant
to sort your email for you -- expect most people seem to want to hide that
they live for farm porn spam <wink>.




More information about the spambayes-dev mailing list