[Spambayes] how spambayes handles image-only spams
anthony at interlink.com.au
Wed Sep 10 15:39:48 EDT 2003
>>> "Tim Peters" wrote
> Well, correlation actually appears to help us more often than it hurts us,
> and stripping HTML in the tokenizer was a hack to blind the classifier to
> the strongest source of harmful correlation I know of.
We also don't (by default) tokenise all headers, either, because of
the harmful correlation of the bazillions of mailman headers Barry
has seen fit to inflict on us <0.5 wink>
> they score so hammy now that one new contradictory token
> won't hurt them. But that's just me.
You're hammy? I kinda figured as such.
> > [failed tokeniser mods]
> Whatever you came up with would be an underestimate! In the very early
> days, I tried new stuff 7 days a week whenever I was awake.
I did similar things. In hindsight I regret not recording the various
things I tried - I'm pretty sure I had the same brilliant, obvious, and
wrong ideas a couple of times.
> Goodness, I *still*
> personally insulted that folding case turned out to work as well as
> preserving it <0.5 wink>.
Well, you know the spammers did that deliberately to annoy the people
working on spam filtering solutions.
Anthony Baxter <anthony at interlink.com.au>
It's never too late to have a happy childhood.
More information about the Spambayes