spam classification breaker

Robin Becker robin at
Thu Feb 5 23:46:52 CET 2004

In article <mailman.1245.1075996814.12720.python-list at>, Tim
Peters < at> writes
>If I'm a spammer trying to get my pitches seen by you, and you're using a
>personal Bayesian classifier, then I need to load my pitches with words that
>are very hammy to you.  If I don't have access to your personal training
>data (if I do, I already own your machine ...), then I need to *deduce*
>what's hammy to you.  One way to do that is, as John Graham-Cumming noted
>here, is for me to send you thousands of messages with different piles of
>words, and note which ones did and didn't get caught by your filter.   Then
>I load my sales pitches with words from the ones that your filter didn't
>reject, and avoid words from ones your filter did reject.  In order to do
>that, I have to know which messages you did and didn't look at.  That's the
>purpose of the HTML "web bug"/"web beacon"s in the thousands of test
>messages.  (If your email client renders HTML pages, including fetching
>images off the net, a spammer can know when you've rendered their message,
>by, e.g., embedding your email address as a parameter in a URL that fetches
>a .jpg to display.)
.... are you asserting that spammers don't have access to the pdf that
users are filtering? Each filter may be unique, but they can be biassed.
Robin Becker

More information about the Python-list mailing list