[Spambayes] Missing obvious spams?

Tony Meyer tameyer at ihug.co.nz
Tue Mar 8 02:40:45 CET 2005


> Which brings a related question -  does SB do any de-masking,
> such as replacing a "|" with an "l" if it's in the middle of
> the word? and replacing all the accent letters with their
> equivalent non-accented character?

Not at present.  (Apart from removing HTML tags, which de-masks th<!--
foo-->ings li<b></b>ke that).

IIRC Skip ran a test that did the latter and found that it did basically
nothing for results.  I can try and dig up a reference to that thread if you
like.

I believe the theory is that you can't do this with every word without
totally losing any chance of someone reading the thing and buying from you,
and so there will always be enough left to give you away (in the headers,
for a start).

One trouble with de-masking is that there are so many different ways to do
it, so you need a never-ending amount of code to try and do the de-mask.  If
anyone has specific ideas they'd like to try, then let me know and when I
have time I'll write up patches.  (The wiki is one place you can put these
ideas: <http://entrian.com/sbwiki>).

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.



More information about the Spambayes mailing list