python spam filter: random words?

Terry Reedy tjreedy at udel.edu
Mon Aug 11 12:19:52 EDT 2003


"Marc Wilson" <marc at cleopatra.co.uk> wrote in message
news:pa1fjvck8jidlj2ne5e63esmd2sk2ndl6v at 4ax.com...
> In comp.lang.python,  revyakin at yahoo.com (revyakin) (revyakin) wrote
in
> <fa06e058.0308101713.9679884 at posting.google.com>::
>
> |I know fighting spam is like fighting global worming, but still..
> |50% of spam I get these days contains a random combination of
letters
> |at the end of the subject line. Has anyone tried using that feature
in
> |antispam filters?
>
> How do you detect "random" letters?  You can only (programmatically)
> determine that a character sequence is "random" if it doesn't appear
in some
> sort of dictionary, and even there you have the risk of false
positives due
> to typos, acronyms etc.

Looking at successive letter pairs would go a long way.  Out of the
(26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
is a giveaway).  Using triples would allow inclusion of common
three-letter acronyms as legal.

Terry J. Reedy






More information about the Python-list mailing list