[Spambayes] Idea for defeating literary exerpts in spam

Schreurs, Blake BSchreurs at iss-md.com
Thu Jan 22 11:06:01 EST 2004


Hey Spambayes folks,
  I've been using spambayes for quite a while now.  I have to congratulate
you on a job very well done!  Recently, more spam has been making it into my
inbox.  In all cases, this has been spam that has either literary text, or
random text at the bottom of the message.  I noticed a couple trends that
might help you defeat this.
 
First, the word garbage is always at the END of the message.  They want to
get their sales pitch in first (Spam with garbage at the top would be
useless).
 
Second, the word density of the text at the end of the message is always
higher than the sales payload itself.
 
Third, the anti-baysian text is always separated from the sales payload by
one or more blank spaces.
 
With these ideas in mind, I think it may be possible to separate the payload
from the anti-baysian portions of the message.  For example, you could
weight a message for the entire message, and weight the same message without
the last significant paragraph.  It would be interesting to see what kind of
differing results you would get.  For example, in this message, you'd ignore
my name and corporate information.  Not much of a loss, really.
 
The only problem I see are messages with no text payload (relies on an image
for payload).  I'm still not sure what to do about these.
 
Blake A. Schreurs
Webmaster & Systems Engineer
Information Systems Support, Inc.
13 Firstfield Road, Suite 100
Gaithersburg, Maryland 20878
(301) 896-0500 ext. 110
(301) 890-0760 (fax)
bschreurs at iss-md.com <mailto:bschreurs at iss-md.com> 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20040122/07c9ee62/attachment.html


More information about the Spambayes mailing list