[spambayes-bugs] [ spambayes-Patches-1532856 ] Compute size of embedded images

SourceForge.net noreply at sourceforge.net
Wed Aug 2 04:02:41 CEST 2006


Patches item #1532856, was opened at 2006-08-01 21:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=1532856&group_id=61702

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Compute size of embedded images

Initial Comment:
Attached is a tokenizer patch that generates int(log2(size))
tokens for embedded images.  It seems clear we have to do more
about image-based spam.  This seems like a cheap trick, and at
least for my current corpus generates a fair number of spammy
clues:

token,nspam,nham,spam prob
image-size:2**6,4,1,0.5
image-size:2**7,4,1,0.5
image-size:2**5,1,0,0.844827586207
image-size:2**8,6,0,0.96511627907
image-size:2**9,3,0,0.934782608696
image-size:2**10,7,1,0.620791675168
image-size:2**11,9,0,0.97619047619
image-size:2**12,13,0,0.983271375465
image-size:2**13,14,0,0.984429065744
image-size:2**14,53,0,0.995790458372
image-size:2**15,19,1,0.813543282782

Of course, it may not improve discrimination with tested more
rigorously, but it might be worth a try.  I haven't done any NxN
testing.  I no longer have more training messages laying about
than is necessary for my day-to-day needs.

Note that the patch will apply to current sources with an offset or
two.  I have a couple other mods in my current source code that
I excised from the diffs.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=1532856&group_id=61702


More information about the Spambayes-bugs mailing list