[spambayes-dev] accented characters in subject

Tys von Gaza tys at tvg.ca
Fri Oct 31 01:59:24 EST 2003


Haven't been keeping up with the list, but has someone tried converting to
plain ASCII but not adding it as a word token but as a "manipulated word"
token?  Then you would be generializing modifications to the word, but
still keeping the fact that the word was modified (which hopefully is a
spam clue).

Ie:

hel.lo, he,llo, etc become:
word-punct:hello

héllo, hellö, etc become:
word-ext:hello

- Tys

Paul Sorenson said:
> Further to my email a few weeks ago, as well as punctuation in the middle
> of
> words in the subject line I am also starting to get spam using umlauts,
> accents and that kind of thing in the subject line.
>
> I am not sure what the conclusion was re the efficacy of removing
> punctuation but if it is an improvement then folding extended characters
> into plain ASCII might also help.
>
> cheers
>
>
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev at python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
>
>




More information about the spambayes-dev mailing list