spam classification breaker

Robin Becker robin at jessikat.fsnet.co.uk
Thu Feb 5 23:46:52 CET 2004


In article <mailman.1245.1075996814.12720.python-list at python.org>, Tim
Peters <tim.one at comcast.net> writes
..
....
>tomatically.
>
>If I'm a spammer trying to get my pitches seen by you, and you're using a
>personal Bayesian classifier, then I need to load my pitches with words that
>are very hammy to you.  If I don't have access to your personal training
>data (if I do, I already own your machine ...), then I need to *deduce*
>what's hammy to you.  One way to do that is, as John Graham-Cumming noted
>here, is for me to send you thousands of messages with different piles of
>words, and note which ones did and didn't get caught by your filter.   Then
>I load my sales pitches with words from the ones that your filter didn't
>reject, and avoid words from ones your filter did reject.  In order to do
>that, I have to know which messages you did and didn't look at.  That's the
>purpose of the HTML "web bug"/"web beacon"s in the thousands of test
>messages.  (If your email client renders HTML pages, including fetching
>images off the net, a spammer can know when you've rendered their message,
>by, e.g., embedding your email address as a parameter in a URL that fetches
>a .jpg to display.)
.... are you asserting that spammers don't have access to the pdf that
users are filtering? Each filter may be unique, but they can be biassed.
-- 
Robin Becker



More information about the Python-list mailing list