[spambayes-bugs] [ spambayes-Feature Requests-922840 ] Score multipart/alternative separately

SourceForge.net noreply at sourceforge.net
Mon May 3 20:47:28 EDT 2004


Feature Requests item #922840, was opened at 2004-03-24 16:27
Message generated for change (Comment added) made by leobru
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=922840&group_id=61702

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Leonid (leobru)
Assigned to: Nobody/Anonymous (nobody)
Summary: Score multipart/alternative separately

Initial Comment:
The amount of spam with multipart/alternative content where
the plain text is a piece of prose or such, and the
HTML is a UCE, is growing. My proposal is: 

- compute separate scores as if the text/plain part was
empty, and as if the text/html part was empty

- to compute the final score, use min(plain_hamscore,
html_hamscore) and max(plain_spamscore, html_spamscore)
because any disparity is by itself a spam indicator.



----------------------------------------------------------------------

>Comment By: Leonid (leobru)
Date: 2004-05-03 17:47

Message:
Logged In: YES 
user_id=790676

A 75 Kb long spam message has been observed that scored an
exact 0.50 because of that technique. The text/plain part
was an enormous list of space-separated random words that
happened to include enough "hammy" words in my database to
saturate the default 150 word cutoff before the "spammy"
ones would have started to prevail.  Unless measures are
taken, the spammers will learn the trick quickly.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=922840&group_id=61702



More information about the Spambayes-bugs mailing list