[Spambayes] spam designed to defeat Bayesian filters

papaDoc papaDoc at videotron.ca
Wed Nov 19 14:27:40 EST 2003


Hi,

This classify for me as Spam
Spam probability: *77.55% (0.7754698219)*.

This is the token
*Word* 	*Probability* 	*Times in ham* 	*Times in spam*
*H* 	0.445814 	- 	-
*S* 	0.996753 	- 	-
nov 	0.065217 	3 	0
vcr 	0.065217 	3 	0
before 	0.083754 	6 	1
skip:1 10 	0.083754 	6 	1
battery 	0.091837 	2 	0
controls 	0.091837 	2 	0
emergency 	0.091837 	2 	0
given 	0.091837 	2 	0
knowing 	0.091837 	2 	0
protection 	0.091837 	2 	0
selection 	0.091837 	2 	0
separately 	0.091837 	2 	0
switches 	0.091837 	2 	0
talking 	0.091837 	2 	0
teaching 	0.091837 	2 	0
telling 	0.091837 	2 	0
window 	0.091837 	2 	0
either 	0.097789 	5 	1
good 	0.117544 	4 	1
control 	0.147499 	3 	1
network 	0.147499 	3 	1
provides 	0.147499 	3 	1
tell 	0.147499 	3 	1
white 	0.147499 	3 	1
based 	0.149229 	5 	2
basic 	0.155172 	1 	0
considered 	0.155172 	1 	0
consists 	0.155172 	1 	0
containing 	0.155172 	1 	0
core 	0.155172 	1 	0
corrected 	0.155172 	1 	0
effects 	0.155172 	1 	0
elements 	0.155172 	1 	0
elevator 	0.155172 	1 	0
empty 	0.155172 	1 	0
enables 	0.155172 	1 	0
encouraged 	0.155172 	1 	0
from: 	0.155172 	1 	0
graph 	0.155172 	1 	0
graphics 	0.155172 	1 	0
ground 	0.155172 	1 	0
keyboard 	0.155172 	1 	0
kind 	0.155172 	1 	0
laboratory 	0.155172 	1 	0
language 	0.155172 	1 	0
necessarily 	0.155172 	1 	0
none 	0.155172 	1 	0
produced 	0.155172 	1 	0
proper 	0.155172 	1 	0
properties 	0.155172 	1 	0
protect 	0.155172 	1 	0
seem 	0.155172 	1 	0
seems 	0.155172 	1 	0
senior 	0.155172 	1 	0
separate 	0.155172 	1 	0
subject: 	0.155172 	1 	0
switching 	0.155172 	1 	0
target 	0.155172 	1 	0
task 	0.155172 	1 	0
telephone 	0.155172 	1 	0
whole 	0.155172 	1 	0
because 	0.184556 	7 	4
got 	0.195811 	5 	3
else 	0.198686 	2 	1
whether 	0.198686 	2 	1
version 	0.219899 	3 	2
going 	0.231109 	4 	3
skip:h 10 	0.231109 	4 	3
content-type:text/plain 	0.2363 	32 	27
balance 	0.844828 	0 	1
ball 	0.844828 	0 	1
bar 	0.844828 	0 	1
basis 	0.844828 	0 	1
become 	0.844828 	0 	1
becomes 	0.844828 	0 	1
consumption 	0.844828 	0 	1
continually 	0.844828 	0 	1
continued 	0.844828 	0 	1
continues 	0.844828 	0 	1
continuing 	0.844828 	0 	1
convention 	0.844828 	0 	1
conventions 	0.844828 	0 	1
convinced 	0.844828 	0 	1
corner 	0.844828 	0 	1
electronic 	0.844828 	0 	1
encourage 	0.844828 	0 	1
ended 	0.844828 	0 	1
government 	0.844828 	0 	1
grant 	0.844828 	0 	1
grants 	0.844828 	0 	1
graphic 	0.844828 	0 	1
greatly 	0.844828 	0 	1
green 	0.844828 	0 	1
gross 	0.844828 	0 	1
groups 	0.844828 	0 	1
growing 	0.844828 	0 	1
lands 	0.844828 	0 	1
message-id:invalid 	0.844828 	0 	1
natural 	0.844828 	0 	1
naturally 	0.844828 	0 	1
necessary 	0.844828 	0 	1
neither 	0.844828 	0 	1
normal 	0.844828 	0 	1
promise 	0.844828 	0 	1
promised 	0.844828 	0 	1
protected 	0.844828 	0 	1
prove 	0.844828 	0 	1
seeing 	0.844828 	0 	1
seek 	0.844828 	0 	1
sees 	0.844828 	0 	1
selected 	0.844828 	0 	1
self 	0.844828 	0 	1
series 	0.844828 	0 	1
skip:( 20 	0.844828 	0 	1
skip:[ 10 	0.844828 	0 	1
skip:b 40 	0.844828 	0 	1
tape 	0.844828 	0 	1
to:none 	0.844828 	0 	1
willing 	0.844828 	0 	1
wins 	0.844828 	0 	1
woman 	0.844828 	0 	1
women 	0.844828 	0 	1
contain 	0.908163 	0 	2
contained 	0.908163 	0 	2
end 	0.908163 	0 	2
grow 	0.908163 	0 	2
known 	0.908163 	0 	2
national 	0.908163 	0 	2
production 	0.908163 	0 	2
project 	0.908163 	0 	2
properly 	0.908163 	0 	2
provide. 	0.908163 	0 	2
provided 	0.908163 	0 	2
publication 	0.908163 	0 	2
sending 	0.908163 	0 	2
serious 	0.908163 	0 	2
women's 	0.908163 	0 	2
wondering 	0.908163 	0 	2
copy 	0.934783 	0 	3
giving 	0.934783 	0 	3
growth 	0.934783 	0 	3
nearly 	0.934783 	0 	3
needs 	0.934783 	0 	3
secure 	0.934783 	0 	3
skip:x 10 	0.934783 	0 	3
taking 	0.958716 	0 	5
wish 	0.958716 	0 	5
without 	0.969799 	0 	7
mail 	0.980349 	0 	11
url:biz 	0.99236 	0 	29



Seth Goodman wrote:

>Attached is an email (along with resulting spam clues) that apparently was
>designed specifically to get past Bayesian filters.  I believe this was
>mentioned before as the "white on white" HTML problem.  The email has a
>large number of legitimate words, probably randomly picked from a
>dictionary, in a section where the font color is almost white on a white
>background.  There is a little snippet of HTML at the end that contains my
>email address.  I don't know what it does but I don't like the looks of it.
>The message appears blank, unless you look very closely and then look at the
>HTML source.  Not only does this message slip through the classifier as ham,
>but training on this message as spam would probably reduce the effectiveness
>of the classifier.
>
>My questions are:
>
>1) What is this thing?  Does it harvest addresses when rendered?
>
>2) Are there any approaches that have been discussed to ignore the "almost
>white" text during parsing?
>
>3) Is anything in the works for this exploit?
>
>--
>Seth Goodman
>
>  Humans:   please remove ".delete" to reply
>
>  Spambots: please disregard the above
>  
>
>
> ------------------------------------------------------------------------
>
> Subject:
> movie's
> From:
> "Dion Tiff" <gwenneth401 at hotmail.com>
> Date:
> Wed, 19 Nov 2003 03:12:07 -0600
> To:
> <sethg at goodmanassociates.com>
>
>
> hriie os hy vsicwj  k tnu mk  k vcr  uuhfw wawp ucu neqge oo cqstcpw
>jrsqldm qvkm ncy fim
>
>  
>
>
> ------------------------------------------------------------------------
>
> Subject:
> Spam Clues: movie's
>
>
>Spam Score: 0% (0)
>
>
>
>word                                spamprob         #ham  #spam
>'*H*'                               1                   -      -
>'*S*'                               0                   -      -
>'provides'                          0.00790861         28      0
>'project'                           0.00819672         27      0
>'basic'                             0.00884086         25      0
>'efforts'                           0.0100223          22      0
>'electronics'                       0.0100223          22      0
>'bar'                               0.0110024          20      0
>'nature'                            0.012894           17      0
>'switch'                            0.0136778          16      0
>'wind'                              0.0136778          16      0
>'neither'                           0.0145631          15      0
>'groups'                            0.0155709          14      0
>'necessarily'                       0.0167286          13      0
>'properly'                          0.0167286          13      0
>'bars'                              0.0180723          12      0
>'enable'                            0.0180723          12      0
>'noise'                             0.0180723          12      0
>'continuing'                        0.0196507          11      0
>'convinced'                         0.0196507          11      0
>'projects'                          0.0196507          11      0
>'technical'                         0.0209429          53      1
>'correct.'                          0.0215311          10      0
>'grateful'                          0.0215311          10      0
>'senior'                            0.0215311          10      0
>'tells'                             0.0215311          10      0
>'window'                            0.0215311          10      0
>'conventional'                      0.0238095           9      0
>'glass'                             0.0238095           9      0
>'go.'                               0.0238095           9      0
>'knows'                             0.0238095           9      0
>'nation'                            0.0238095           9      0
>'series'                            0.0238095           9      0
>'encourage'                         0.0266272           8      0
>'technique'                         0.0266272           8      0
>'wine'                              0.0266272           8      0
>'controlling'                       0.0302013           7      0
>'core'                              0.0302013           7      0
>'emphasis'                          0.0302013           7      0
>'grant'                             0.0302013           7      0
>'granted'                           0.0302013           7      0
>'nervous'                           0.0302013           7      0
>'networks'                          0.0302013           7      0
>'protected'                         0.0302013           7      0
>'selecting'                         0.0302013           7      0
>'talked'                            0.0302013           7      0
>'tape'                              0.0302013           7      0
>'win'                               0.0302013           7      0
>'wishes'                            0.0302013           7      0
>'conversation'                      0.0348837           6      0
>'election'                          0.0348837           6      0
>'encouraging'                       0.0348837           6      0
>'governor'                          0.0348837           6      0
>'grows'                             0.0348837           6      0
>'suspected'                         0.0348837           6      0
>'winter'                            0.0348837           6      0
>'night'                             0.0351794          31      1
>'bear'                              0.0412844           5      0
>'copied'                            0.0412844           5      0
>'corrected'                         0.0412844           5      0
>'effectively'                       0.0412844           5      0
>'encounter'                         0.0412844           5      0
>'encourages'                        0.0412844           5      0
>'ending'                            0.0412844           5      0
>'kill'                              0.0412844           5      0
>'label'                             0.0412844           5      0
>'promising'                         0.0412844           5      0
>'sends'                             0.0412844           5      0
>'talks'                             0.0412844           5      0
>'teach'                             0.0412844           5      0
>'winning'                           0.0412844           5      0
>'network'                           0.0474167          41      2
>'whether'                           0.0474167          41      2
>'beginning'                         0.048731           22      1
>'constraints'                       0.0505618           4      0
>'context'                           0.0505618           4      0
>'continually'                       0.0505618           4      0
>'contract.'                         0.0505618           4      0
>'contribution'                      0.0505618           4      0
>'convince'                          0.0505618           4      0
>'convincing'                        0.0505618           4      0
>'cope'                              0.0505618           4      0
>'correcting'                        0.0505618           4      0
>'elect'                             0.0505618           4      0
>'keeps'                             0.0505618           4      0
>'keys'                              0.0505618           4      0
>'kinds'                             0.0505618           4      0
>'knocked'                           0.0505618           4      0
>'promised'                          0.0505618           4      0
>'selects'                           0.0505618           4      0
>'switching'                         0.0505618           4      0
>'widespread'                        0.0505618           4      0
>'wondered'                          0.0505618           4      0
>'graphics'                          0.0532931          20      1
>'key'                               0.0635763          30      2
>'backs'                             0.0652174           3      0
>'backwards'                         0.0652174           3      0
>'basically'                         0.0652174           3      0
>'consists'                          0.0652174           3      0
>'elements'                          0.0652174           3      0
>'elsewhere'                         0.0652174           3      0
>'encouraged'                        0.0652174           3      0
>"government's"                      0.0652174           3      0
>"governor's"                        0.0652174           3      0
>'grew'                              0.0652174           3      0
>'lacks'                             0.0652174           3      0
>'ladies'                            0.0652174           3      0
>'names.'                            0.0652174           3      0
>'nearby'                            0.0652174           3      0
>'next.'                             0.0652174           3      0
>'noisy'                             0.0652174           3      0
>'programmers'                       0.0652174           3      0
>'proportion'                        0.0652174           3      0
>'protest'                           0.0652174           3      0
>'sells'                             0.0652174           3      0
>'separately'                        0.0652174           3      0
>'serial'                            0.0652174           3      0
>'teeth.'                            0.0652174           3      0
>'within.'                           0.0652174           3      0
>'telling'                           0.0655701          16      1
>'group'                             0.065624           42      3
>'systems'                           0.067776           28      2
>'greatly'                           0.0695772          15      1
>'none'                              0.0695772          15      1
>'telephone'                         0.0695772          15      1
>'sensitive'                         0.0741059          14      1
>'separate'                          0.0741059          14      1
>'glad'                              0.0780931          24      2
>'wide'                              0.0780931          24      2
>'basis'                             0.0792652          13      1
>'ground'                            0.0792652          13      1
>'consider'                          0.0797402          34      3
>'copy'                              0.0840554          52      5
>'seems'                             0.0842721          32      3
>'property'                          0.0845266          22      2
>'base'                              0.0851967          12      1
>'becomes'                           0.0851967          12      1
>'team'                              0.087111           50      5
>'suspended'                         0.0918367           2      0
>"system's"                          0.0918367           2      0
>'table.'                            0.0918367           2      0
>'tank'                              0.0918367           2      0
>'tea'                               0.0918367           2      0
>'teacher'                           0.0918367           2      0
>'teaching'                          0.0918367           2      0
>'whilst'                            0.0918367           2      0
>'whoever'                           0.0918367           2      0
>'wider'                             0.0918367           2      0
>'wild'                              0.0918367           2      0
>"window's"                          0.0918367           2      0
>'x-mailer:qualcomm windows eudora version 5.1' 0.925475            2
>30
>'url:biz'                           0.995187            1    277
>  
>




More information about the Spambayes mailing list