[Spambayes] Inspecting images (was: SpamBayes to HandleEmbeddedImages)

Ken Gordon ksg at telusplanet.net
Wed Oct 26 17:26:02 CEST 2005


There's a lot more to spambayes than just evaluating content. Here's  
the SB Evidence header from a recent spam. But for 'charset', very  
little of this has to do with the content, yet it was correctly  
classified as spam.

> X-Spambayes-Evidence: 	'*H*': 0.00; '*S*': 1.00; 'received:192.168.1':  
> 0.10; 'subject:skip:B 10': 0.16; 'received:192.168': 0.20;  
> 'received:192': 0.21; 'url:www': 0.23; 'content-type:image/jpeg':  
> 0.34;  'to:addr:none': 0.38; 'header:Return-Path:1': 0.38;  
> 'header:MIME-Version:1': 0.61; 'url:': 0.64; 'x-mailer:none': 0.71;  
> 'to:no real name:2**0': 0.72; 'from:name:\x1b$b5z at nf`1{\x1b(b': 0.84;  
> 'message-id:@imx100522.ath.cx': 0.84; 'received:imx100522.ath.cx':  
> 0.84; 'url:fetish': 0.84; 'received:192.168.1.11': 0.91;  
> 'received:kick': 0.91; 'content-type:multipart/related': 0.92;  
> 'received:210.153': 0.93; 'received:ath.cx': 0.93; 'url:cc': 0.93;  
> 'virus:src="cid:': 0.95; 'content-type/type:multipart/alternative':  
> 0.96; 'received:cx': 0.97; 'email addr:yahoo.co.jp': 0.99; 'skip:\x1b  
> 80': 0.99; 'from:addr:yahoo.co.jp': 1.00; 'from:charset:iso-2022-jp':  
> 1.00; 'skip:\x1b 60': 1.00; 'skip:\x1b 30': 1.00; 'skip:\x1b 20':  
> 1.00; 'skip:\x1b 50': 1.00; 'subject:$': 1.00; 'received:210': 1.00;  
> 'charset:iso-2022-jp': 1.00; 'subject:\x1b$': 1.00;  
> 'subjectcharset:iso-2022-jp': 1.00



On 2005 Oct 25, at 8:37, <FreeMJ at HotPop.com> wrote:

> How?  Technically speaking, what could your SpamBayes installation be  
> doing
> differently?  These are ALL ham words, so how is it that your e-mail  
> could
> be classifying all of this as Spam?  If it is, I suspect you're losing  
> a lot
> of legitimate e-mail with it.
>
> FMJ
>
> -----Original Message-----
> From: Ken Gordon [mailto:ksg at telusplanet.net]
> Sent: Monday, October 24, 2005 8:58 PM
> To: FreeMJ at HotPop.com
> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to
> HandleEmbeddedImages)
>
> My installation of SpamBayes catches nearly all of these. I don't see  
> one a
> month outside of the Spam folder.
>
> ---
> Ken Gordon
> (780) 628-2758
> http://www.wolfe-gordon.ca
> On 2005 Oct 24, at 20:18, <FreeMJ at HotPop.com> wrote:
>
>> Hi Tony,
>> The problem is, they keep changing the meaningless text at the bottom
>> of the e-mail all the time, to confuse the Spam filter.  They're
>> picking Hammy words.  And, as you can see, it's a highly effective
>> technique.  In other words, NONE of the "Tokens" should actually be
>> "Significant", it's the image that needs to be scored in this case.
>> Here's the spambayes clues for one of the e-mails:
>>
>> Combined Score: 3% (0.0330173)
>> Internal ham score (*H*): 0.999976
>> Internal spam score (*S*): 0.0660102
>>
>> # ham trained on: 14237
>> # spam trained on: 20138
>>
>> 150 Significant Tokens
>> token                               spamprob         #ham  #spam
>> 'sender:no real name:2**0'          0.0277535        2187     88
>> 'dismissed'                         0.0374933         314     17
>> 'raising'                           0.0417704         313     19
>> 'lives'                             0.0580962        1012     88
>> 'ill'                               0.0613924        1084    100
>> 'said'                              0.0677803        6498    668
>> 'two'                               0.08226          5200    659
>> 'put'                               0.0828439        2632    336
>> 'were'                              0.0845653        6094    796
>> 'recalled'                          0.0862187          92     12
>> 'town'                              0.0883783         600     82
>> 'being'                             0.0894639        4312    599
>> 'letter'                            0.093344         1595    232
>> 'unless'                            0.0960663         687    103
>> 'stephan'                           0.0968154          15      2
>> 'face'                              0.0986506        1397    216
>> 'who'                               0.0991493        8031   1250
>> 'knows'                             0.102049          574     92
>> 'anyone'                            0.104976         1828    303
>> 'them'                              0.106325         4690    789
>> 'think'                             0.107446         3584    610
>> 'keep'                              0.109385         2517    437
>> 'him'                               0.111552         2631    467
>> 'suspicions'                        0.113796           40      7
>> 'went'                              0.11401          1331    242
>> 'sound'                             0.116592          596    111
>> 'care'                              0.117491         1244    234
>> 'going'                             0.119623         3503    673
>> 'sort'                              0.119677          511     98
>> 'his'                               0.119861         5717   1101
>> 'remained'                          0.11998           271     52
>> 'heavily'                           0.123551          232     46
>> 'last'                              0.126157         5241   1070
>> 'subject:: '                        0.134951         9110   2010
>> 'voice'                             0.135891          644    143
>> 'walk'                              0.140296          339     78
>> 'everyone'                          0.140502         1225    283
>> 'whatever'                          0.141645          618    144
>> 'overdosed'                         0.142155           48     11
>> 'mother'                            0.144908          510    122
>> 'way'                               0.146154         3458    837
>> 'was'                               0.146612         8939   2172
>> 'would'                             0.146893         7679   1870
>> 'but'                               0.14865          8435   2083
>> 'past'                              0.155513         1932    503
>> 'duty'                              0.15756           326     86
>> 'been'                              0.158577         6937   1849
>> 'away'                              0.159247         1632    437
>> 'soon'                              0.16154          1021    278
>> 'header:In-Reply-To:1'              0.162139         1791    490
>> 'made'                              0.163602         3467    959
>> 'true'                              0.164161          566    157
>> 'too'                               0.164462         2199    612
>> 'then'                              0.167186         3519    999
>> 'road'                              0.169212          459    132
>> 'covington'                         0.170591           18      5
>> 'firmly'                            0.171729           69     20
>> 'received'                          0.172468         1646    485
>> 'yes'                               0.17276           275     81
>> 'other'                             0.174723         6686   2002
>> 'offered'                           0.177462          702    214
>> 'saw'                               0.178119          738    226
>> 'might'                             0.184601         2399    768
>> 'hotel'                             0.185114          203     65
>> 'thought'                           0.186457         1287    417
>> 'her'                               0.187192         2831    922
>> 'indeed'                            0.18721           191     62
>> 'lie'                               0.188538          165     54
>> 'filled'                            0.188682          329    108
>> 'assorted'                          0.198662           32     11
>> 'intent'                            0.199592          596    210
>> 'manner'                            0.200765          192     68
>> 'second'                            0.203991         1311    475
>> 'let'                               0.207891         1835    681
>> 'much'                              0.210328         3345   1260
>> 'back'                              0.211425         3207   1216
>> 'place'                             0.214507         1704    658
>> 'out'                               0.216398         6503   2540
>> 'little'                            0.218176         2273    897
>> 'within'                            0.218497         1940    767
>> 'occupied'                          0.218989           56     22
>> 'never'                             0.222876         2224    902
>> 'take'                              0.223351         4101   1668
>> 'subject:-'                         0.223886         2564   1046
>> 'find'                              0.224822         2482   1018
>> 'play'                              0.230279          518    219
>> 'skip:n 10'                         0.233772         2561   1105
>> 'eyes'                              0.234231          294    127
>> 'that'                              0.245614        11155   5137
>> 'thoughts'                          0.250399          193     91
>> 'observed'                          0.252899          109     52
>> 'not'                               0.253605         9451   4542
>> 'have'                              0.260054        10350   5145
>> 'myself'                            0.268888          281    146
>> 'with'                              0.272839        10712   5685
>> 'skip:r 10'                         0.274264         4752   2540
>> 'look'                              0.276317         1963   1060
>> 'can'                               0.286752         7254   4125
>> 'guided'                            0.29442            24     14
>> 'all'                               0.300499         8283   5033
>> 'resign'                            0.304561           39     24
>> 'contracts'                         0.313223          163    105
>> 'subject:Alert'                     0.322897           61     41
>> 'upon'                              0.326586          853    585
>> 'skip:i 10'                         0.332672         4717   3326
>> 'for'                               0.339583        12494   9087
>> 'topics'                            0.371008          114     95
>> 'the'                               0.371613        13338  11157
>> 'above'                             0.380529          678    589
>> 'header:Return-Path:1'              0.635635         6219  15346
>> 'consults'                          0.695316            3     10
>> 'comparative'                       0.728703           17     65
>> 'earnest'                           0.747547           24    101
>> 'friendship'                        0.796906            6     34
>> 'blush'                             0.797234           13     73
>> 'skip:7 70'                         0.805302            5     30
>> 'expedition'                        0.825248            9     61
>> 'from:addr:g.wcvbss'                0.844828            0      1
>> 'from:addr:netnitco.net'            0.844828            0      1
>> 'from:name:raymond goins'           0.844828            0      1
>> 'lensalizarin'                      0.844828            0      1
>> "m'scorset"                         0.844828            0      1
>> 'message-id:@icsp.net'              0.844828            0      1
>> 'ownthat'                           0.844828            0      1
>> 'prominents'                        0.844828            0      1
>> 'roadsthat'                         0.844828            0      1
>> 'sender:addr:athenet.net'           0.844828            0      1
>> 'sender:addr:h.nnq'                 0.844828            0      1
>> 'subject:< '                        0.844828            0      1
>> 'subject:Stiles'                    0.844828            0      1
>> 'totrue'                            0.844828            0      1
>> 'virus:src="cid:'                   0.888282          111   1250
>> 'congenial'                         0.905802            5     70
>> 'taters'                            0.907976            1     16
>> 'skip:7 90'                         0.908163            0      2
>> 'header:Received:2'                 0.914966          886  13487
>> 'diem'                              0.92631             3     56
>> 'subject:CBXC'                      0.949438            0      4
>> 'rotund'                            0.952904            1     33
>> 'blushingly'                        0.958716            0      5
>> 'refolding'                         0.969799            0      7
>> 'egress'                            0.970088            1     53
>> 'to:name:freemj'                    0.988432            0     19
>> 'septennial'                        0.990405            0     23
>> 'veal'                              0.993066            0     32
>> 'youll'                             0.993469            0     34
>> 'subject:Stock'                     0.99571             0     52
>> 'casteth'                           0.995868            0     54
>> 'cutlet'                            0.996894            0     72
>> 'to:addr:hotpop.com'                0.997792           23  14803
>>
>> Message Stream
>> Return-Path: <H.jykqli at valkyrie.net>
>> Received: from 38.113.3.52 (unknown [200.107.173.172])
>> 	by mx1.hotpop.com (Postfix) with SMTP
>> 	id 5B8A0E8304; Sun, 23 Oct 2005 23:49:29 +0000 (UTC)
>> Received: from spellbound.gape.jeffersonian.gauguin.es
>> ([200.107.173.172]
>> 	helo=scatterbrain.mail.elknet.net) by smtp9.bt.com with esmtp
>> 	id 0X162p-8865LL-80; Mon, 24 Oct 2005 01:48:41 +0100
>> Message-Id: <8927397790.37444460700 at icsp.net>
>> Sender: H.nnq at athenet.net
>> Date: Sun, 23 Oct 2005 20:42:41 -0400
>> In-Reply-To: Your message of "Sun, 23 Oct 2005 20:46:41 -0400."
>> 	<98802417987115.YV37184 at joel.renaissance.arden.net>
>> From: "Raymond Goins" <G.wcvbss at netnitco.net>
>> To: "Freemj" <freemj at hotpop.com>
>> Subject: Fwd: Stock - Alert-CBXC< Neil Stiles
>> MIME-Version: 1.0
>> Content-Type: multipart/related;
>>    boundary="--ZZR8PVzcRDTpf2Pu68MQiz"
>> X-HotPOP-Delivered-To: freemj at hotpop.com
>>
>>
>> negligiblestymie breakwatergrist m'scorset
>> 	
>>
>>
>> We went to the triumph comparative at egress diem then a mouldy sort  
>> of
>> establishment
>> have my place so I blushingly offered to resign it The septennial who
>> made as much of my going away as if I were going to China received me
>> as
>> an
>> was dismissed and other topics occupied us he remained so seldom
>> raising
>> his eyes unless to	
>> true Rosanne was suspicions arose within me that it was an ill  
>> assorted
>> friendship
>> that he never thought of being observed by anyone but was so intent
>> upon
>> her and upon his ownthat I received soon recalled me to myself and put
>> me in the road back to the hotel
>> I was so filled with the play and with the past for it was in a manner
>> Everyone who knows you consults with you and is guided by you Stephan
>> but on second thoughts I shall keep him to take care of me 	
>> and refolding the letter it would be insupportable to me to think of
>> I am in earnest at last so youll soon have to arrange our contracts  
>> and
>> to bind us firmly to them
>> been overdosed with taters I commanded him in my deepest voice to  
>> order
>> a veal cutlet and potatoes
>> Yes I am on an expedition of duty My mother lives a little way out of
>> town and the roadsthat I received soon recalled me to myself and put  
>> me
>> in the road back to the hotel
>> for I saw a faint blush in her face you would have let me find it out
>> for myself that would not lie too heavily upon her purse and to do my
>> duty in it whatever it might be
>> and the prominents walk and the congenial sound of the rotund casteth
>> hovering above them all
>> 7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg7MikU 
>> R
>> q
>> reWfg7M6dwtJ4t1Fxn
>> as he can look at me out of his two eyes Is he indeed said Mr  
>> Covington
>>
>> <HTML><HEAD>
>> <META http-equiv=Content-Type content="text/html;
>> charset=windows-1252">
>> <TITLE>lensalizarin impregnatecost</TITLE>
>> </HEAD>
>> <BODY>
>> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0">
>> <TR><TD><font></font><font></font>
>> <BR><STRONG></STRONG><IMG
>> SRC="cid:lTN1QnT11CtJIk8H6J5X7INGgMff2pS at prairieweb.com" border="0"
>> ALT="negligiblestymie breakwatergrist m'scorset">
>> <BR><STRONG></STRONG><font></font><FONT face="Verdana"
>> size=1><FONT></FONT></font></TD></TR><TR><TD><FONT
>> size=1><BR><BR><font></font><STRONG></STRONG>We went to the triumph
>> comparative at egress diem  then a mouldy sort of
>> establishment<BR>have my
>> place  so I blushingly offered to resign it
>> <STRONG></STRONG><STRONG></STRONG>The septennial  who made as much of
>> my
>> going away as if I were going to China  received me as an<BR>was
>> dismissed
>> and other topics occupied us  he remained so seldom raising his eyes
>> unless
>> to</FONT></TD></TR><TR><TD><FONT size=1>true Rosanne was  suspicions
>> arose
>> within me that it was an ill assorted friendship  <BR>that he never
>> thought
>> of being observed by anyone but was so intent upon her  and upon his
>> own<FONT SIZE=2></FONT><font></font>that I received  soon recalled me
>> to
>> myself  and put me in the road back to the hotel<BR>I was so filled
>> with the
>> play  and with the past   for it was  in a manner<BR>Everyone who
>> knows you
>> consults with you  and is guided by you  Stephan  <BR>but  on second
>> thoughts  I shall keep him to take care of me
>> </FONT></TD></TR><TR><TD><FONT size=1>and refolding the letter  it
>> would be
>> insupportable to me to think of  <BR>I am in earnest at last   so
>> youll soon
>> have to arrange our contracts  and to bind us firmly to
>> them<font></font><BR>been overdosed with taters  I commanded him  in  
>> my
>> deepest voice  to order a veal cutlet and potatoes<BR>Yes  I am on an
>> expedition of duty  My mother lives a little way out of town and the
>> roads<font></font><FONT SIZE=2></FONT>that I received  soon recalled
>> me to
>> myself  and put me in the road back to the hotel<BR>for I saw a faint
>> blush
>> in her face  you would have let me find it out for myself
>> <font></font>that
>> would not lie too heavily upon her purse and to do my duty in it
>> whatever
>> it might be  <BR>and the prominents walk  and the congenial sound of
>> the
>> rotund casteth hovering above them all
>> <BR>7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg7 
>> M
>> ikURq
>> reWfg7M6dwtJ4t1Fxn<BR>as he can look at me out of his two eyes Is he
>> indeed
>> said Mr  Covington  </FONT></TD></TR></TABLE>
>> </BODY>
>> </HTML>
>>
>> All Message Tokens
>> 187 unique tokens
>>
>> 'above'
>> 'all'
>> 'and'
>> 'anyone'
>> 'arose'
>> 'arrange'
>> 'assorted'
>> 'away'
>> 'back'
>> 'been'
>> 'being'
>> 'bind'
>> 'blush'
>> 'blushingly'
>> 'but'
>> 'can'
>> 'care'
>> 'casteth'
>> 'cc:none'
>> 'china'
>> 'commanded'
>> 'comparative'
>> 'congenial'
>> 'consults'
>> 'content-type:text/plain'
>> 'contracts'
>> 'covington'
>> 'cutlet'
>> 'deepest'
>> 'diem'
>> 'dismissed'
>> 'duty'
>> 'earnest'
>> 'egress'
>> 'everyone'
>> 'expedition'
>> 'eyes'
>> 'face'
>> 'faint'
>> 'filled'
>> 'find'
>> 'firmly'
>> 'for'
>> 'friendship'
>> 'from:addr:g.wcvbss'
>> 'from:addr:netnitco.net'
>> 'from:name:raymond goins'
>> 'going'
>> 'guided'
>> 'have'
>> 'header:Date:1'
>> 'header:From:1'
>> 'header:In-Reply-To:1'
>> 'header:MIME-Version:1'
>> 'header:Message-Id:1'
>> 'header:Received:2'
>> 'header:Return-Path:1'
>> 'header:Subject:1'
>> 'header:To:1'
>> 'heavily'
>> 'her'
>> 'him'
>> 'his'
>> 'hotel'
>> 'hovering'
>> 'ill'
>> 'indeed'
>> 'intent'
>> 'keep'
>> 'knows'
>> 'last'
>> 'lensalizarin'
>> 'let'
>> 'letter'
>> 'lie'
>> 'little'
>> 'lives'
>> 'look'
>> "m'scorset"
>> 'made'
>> 'manner'
>> 'message-id:@icsp.net'
>> 'might'
>> 'mother'
>> 'mouldy'
>> 'much'
>> 'myself'
>> 'never'
>> 'not'
>> 'observed'
>> 'occupied'
>> 'offered'
>> 'order'
>> 'other'
>> 'our'
>> 'out'
>> 'overdosed'
>> 'ownthat'
>> 'past'
>> 'place'
>> 'play'
>> 'potatoes'
>> 'prominents'
>> 'purse'
>> 'put'
>> 'raising'
>> 'recalled'
>> 'received'
>> 'refolding'
>> 'remained'
>> 'reply-to:none'
>> 'resign'
>> 'road'
>> 'roadsthat'
>> 'rosanne'
>> 'rotund'
>> 'said'
>> 'saw'
>> 'second'
>> 'seldom'
>> 'sender:addr:athenet.net'
>> 'sender:addr:h.nnq'
>> 'sender:no real name:2**0'
>> 'septennial'
>> 'shall'
>> 'skip:7 70'
>> 'skip:7 90'
>> 'skip:b 10'
>> 'skip:e 10'
>> 'skip:i 10'
>> 'skip:n 10'
>> 'skip:r 10'
>> 'soon'
>> 'sort'
>> 'sound'
>> 'stephan'
>> 'subject: '
>> 'subject: - '
>> 'subject:-'
>> 'subject:: '
>> 'subject:< '
>> 'subject:Alert'
>> 'subject:CBXC'
>> 'subject:Fwd'
>> 'subject:Neil'
>> 'subject:Stiles'
>> 'subject:Stock'
>> 'suspicions'
>> 'take'
>> 'taters'
>> 'that'
>> 'the'
>> 'them'
>> 'then'
>> 'think'
>> 'thought'
>> 'thoughts'
>> 'to:2**0'
>> 'to:addr:freemj'
>> 'to:addr:hotpop.com'
>> 'to:name:freemj'
>> 'too'
>> 'topics'
>> 'totrue'
>> 'town'
>> 'triumph'
>> 'true'
>> 'two'
>> 'unless'
>> 'upon'
>> 'veal'
>> 'virus:src="cid:'
>> 'voice'
>> 'walk'
>> 'was'
>> 'way'
>> 'went'
>> 'were'
>> 'whatever'
>> 'who'
>> 'with'
>> 'within'
>> 'would'
>> 'x-mailer:none'
>> 'yes'
>> 'you'
>> 'youll'
>>
>> -----Original Message-----
>> From: spambayes-bounces at python.org
>> [mailto:spambayes-bounces at python.org] On
>> Behalf Of Tony Meyer
>> Sent: Sunday, October 23, 2005 9:43 PM
>> To: <FreeMJ at HotPop.com>
>> Cc: spambayes at python.org
>> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to
>> HandleEmbeddedImages)
>>
>>> Something really needs to be done about this embedded image Spam.
>>> Honestly,
>>> SpamBayes appears to be ineffective against all these images,
>>
>> Can you post an example of a message that is incorrectly classified,
>> *with
>> the spambayes clues* for the message?  The Outlook plug-in provides
>> this via
>> the "Show Clues for this Message" item in the SpamBayes menu.
>>
>> [...]
>>> I'm sure OCR isn't the only way, but the words are there in plain
>>> view.  It
>>> seems like the obvious way to resolve this.
>>
>> Obvious isn't always best.  One of the tenets here is "stupid beats
>> smart" - I think doing some sort of OCR on images would fall into the
>> "smart" category, and generating simple tokens from the images would
>> fall into the "stupid" category and be more successful.  Just my
>> opinion, of course, but that's what I'd test if I had time (perhaps
>> over the (southern hemisphere) summer...or maybe I can convince one
>> of my employers that this would be worth doing in paid time).
>>
>>> SpamBayes has been such a great program for me and my colleges,
>>> family and
>>> friends.  I can only hope that the project sees fit to resolve this
>>> soon.
>>
>> It's not really a case of "seeing fit" - the issue is that the
>> developers are very short on time at the moment (contributions have
>> always been, and always will be, welcome) and, in addition, this is a
>> complex problem.
>>
>> =Tony.Meyer
>>
>> -- 
>> Please always include the list (spambayes at python.org) in your
>> replies
>> (reply-all), and please don't send me personal mail about SpamBayes.
>> http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
>>
>>
>> _______________________________________________
>> SpamBayes at python.org
>> http://mail.python.org/mailman/listinfo/spambayes
>> Check the FAQ before asking: http://spambayes.sf.net/faq.html
>>
>>
>> _______________________________________________
>> SpamBayes at python.org
>> http://mail.python.org/mailman/listinfo/spambayes
>> Check the FAQ before asking: http://spambayes.sf.net/faq.html
>>
>
>



More information about the SpamBayes mailing list