python spam filter: random words?
Terry Reedy
tjreedy at udel.edu
Mon Aug 11 12:19:52 EDT 2003
"Marc Wilson" <marc at cleopatra.co.uk> wrote in message
news:pa1fjvck8jidlj2ne5e63esmd2sk2ndl6v at 4ax.com...
> In comp.lang.python, revyakin at yahoo.com (revyakin) (revyakin) wrote
in
> <fa06e058.0308101713.9679884 at posting.google.com>::
>
> |I know fighting spam is like fighting global worming, but still..
> |50% of spam I get these days contains a random combination of
letters
> |at the end of the subject line. Has anyone tried using that feature
in
> |antispam filters?
>
> How do you detect "random" letters? You can only (programmatically)
> determine that a character sequence is "random" if it doesn't appear
in some
> sort of dictionary, and even there you have the risk of false
positives due
> to typos, acronyms etc.
Looking at successive letter pairs would go a long way. Out of the
(26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
is a giveaway). Using triples would allow inclusion of common
three-letter acronyms as legal.
Terry J. Reedy
More information about the Python-list
mailing list