[Spambayes] full o' spaces

Tim Peters tim_one at email.msn.com
Sun Mar 9 02:13:56 EST 2003


[Tim Peters]
> There haven't been enough test reports on that one to decide.
> It's True by default in the Outlook client, but still appears to be
> False by default everywhere else.  There are bad visible effects
> either way (if it's off and you get a large ratio imbalance, it's too
> easy for a msg to score incorrectly as belonging to the more popular
> category; if it's on and you get a large ratio imbalance, training on
> another example from the more popular category has little effect,
> exacerbating (for example) the "but I trained on it and it's *still*
> called ham!" irritation).

[Tim Stone]
> Rats.  I thought it was True by default.

It is if you're using the Outlook client.

> All this time I've been using it thinking it was on... ok, so if I
> turn it on now, what would I expect?

Did you read the paragraph you quoted?  I've written several small essays on
the topic here, and think the parenthetical comments above are a decent
summary.

> I have a huge ham/spam imbalance in my notes sb database,

Striving for balance is likely a better idea.

> and have been a bit disappointed by the classifier...

I'm short on telepathy tonight.  Perhaps the *way* in which you're
disappointed is related to the comments above?  For example, if you have
much more ham than spam and have a too-high FN rate, or you have much more
spam than ham and have a too-high FP rate, then the comments are directly
applicable.




More information about the Spambayes mailing list