[spambayes-dev] Interesting unsure

Skip Montanaro skip at pobox.com
Thu Jun 26 08:42:55 EDT 2003


    Tim> For body (but not header) tokenization, the option
    Tim> replace_nonascii_chars (off by default) is very effective against
    Tim> junk like this, at least for those whose ham is mostly 7-bit ASCII.
    Tim> That option replaces each "funny character" with a question mark.
    Tim> So, e.g., any oddball spelling for "o" in "love" turns the token
    Tim> into "l?ve";

I'd like to simply strip the accents.  With the current scheme you still
wind up with four related tokens, "love", "l?ve", "lov?" and "l?v?", all
prefaced by "subject:".  Since what the spammer wants you to read in all
instances is "love", I think that's the target we should aim at where
possible.

    Tim> Indeed, my Unsures this week are utterly dominated by trash
    Tim> bouncing back to various webmaster and admin addresses due to the
    Tim> Sobig worm forging sender addresses, like

    ...

Mine too.

    Tim> It occurs to me that I haven't had "a spam problem" since last year
    Tim> -- now I've got "a virus bounce problem" <0.5 wink>!

I just classify them as spam.  It's actually unclear to me why these
"anti-virus" programs feel the need to reply to such messages.  Most of the
time the sender is forged anyway, so the reply goes to someone who doesn't
have the virus.

Skip



More information about the spambayes-dev mailing list