[Spambayes] how spambayes handles image-only spams

Anthony Baxter anthony at interlink.com.au
Wed Sep 10 15:39:48 EDT 2003


>>> "Tim Peters" wrote
> Well, correlation actually appears to help us more often than it hurts us,
> and stripping HTML in the tokenizer was a hack to blind the classifier to
> the strongest source of harmful correlation I know of.  

We also don't (by default) tokenise all headers, either, because of
the harmful correlation of the bazillions of mailman headers Barry
has seen fit to inflict on us <0.5 wink>

> they score so hammy now that one new contradictory token
> won't hurt them.  But that's just me.

You're hammy? I kinda figured as such.
 
> > [failed tokeniser mods]
> Whatever you came up with would be an underestimate!  In the very early
> days, I tried new stuff 7 days a week whenever I was awake.  

I did similar things. In hindsight I regret not recording the various
things I tried - I'm pretty sure I had the same brilliant, obvious, and
wrong ideas a couple of times.

> Goodness, I *still*
> personally insulted that folding case turned out to work as well as
> preserving it <0.5 wink>.

Well, you know the spammers did that deliberately to annoy the people
working on spam filtering solutions.

-- 
Anthony Baxter     <anthony at interlink.com.au>   
It's never too late to have a happy childhood.




More information about the Spambayes mailing list