<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:o = 

"urn:schemas-microsoft-com:office:office" xmlns:w = 

"urn:schemas-microsoft-com:office:word"><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2800.1400" name=GENERATOR>

<STYLE>@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; }

P.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"

}

LI.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"

}

DIV.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"

}

H2 {

        FONT-SIZE: 18pt; MARGIN-LEFT: 0in; COLOR: green; MARGIN-RIGHT: 0in; FONT-FAMILY: "Times New Roman"; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto

}

A:link {

        COLOR: blue; TEXT-DECORATION: underline

}

SPAN.MsoHyperlink {

        COLOR: blue; TEXT-DECORATION: underline

}

A:visited {

        COLOR: purple; TEXT-DECORATION: underline

}

SPAN.MsoHyperlinkFollowed {

        COLOR: purple; TEXT-DECORATION: underline

}

CODE {

        FONT-FAMILY: "Courier New"

}

PRE {

        FONT-SIZE: 10pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Courier New"

}

TT {

        FONT-FAMILY: "Courier New"

}

SPAN.EmailStyle21 {

        mso-style-type: personal-compose

}

DIV.Section1 {

        page: Section1

}

</STYLE>

</HEAD>

<BODY lang=EN-US vLink=purple link=blue>

<DIV dir=ltr align=left><FONT face=Verdana color=#0000ff size=2><SPAN 

class=428053419-02072004>What's most likely causing this is the imbalance in 

your training. SpamBayes is most accurate if you can train on approximately the 

same number of ham messages as you do spam messages. A ratio of up to 5 to 1 or 

so is probably fine, but your ratio is currently about 44 to 1 towards spam 

which will heavily bias all your results towards ham.</SPAN></FONT></DIV>

<DIV dir=ltr align=left><FONT face=Verdana color=#0000ff size=2><SPAN 

class=428053419-02072004></SPAN></FONT>&nbsp;</DIV>

<DIV dir=ltr align=left><FONT face=Verdana color=#0000ff size=2><SPAN 

class=428053419-02072004>For example, the token "christianity" appears 10 times 

in ham and 7 times in spam, roughly the same number of times. However, the spam 

probability of that token is only .028 because the most basic&nbsp;component of 

the statistics&nbsp;on which&nbsp;SpamBayes is based is the percentage of 

messages that contain the token. This token appears in 10 out of 140 ham 

messages for a ham percentage of 7.14%, and it appears in 7 out of 6168 spam 

messages for a spam percentage of only 0.11%. The ham percentage is almost 63x 

larger than the spam percentage.</SPAN></FONT></DIV>

<DIV dir=ltr align=left><FONT face=Verdana color=#0000ff size=2><SPAN 

class=428053419-02072004></SPAN></FONT>&nbsp;</DIV>

<DIV dir=ltr align=left><FONT face=Verdana color=#0000ff size=2><SPAN 

class=428053419-02072004>With an imbalance this large, your best bet is probably 

to delete your training data and train again from scratch. Try starting out 

without feeding SpamBayes any existing messages for initial training, and then 

train only on mistakes and unsures. If you see several spam messages in your 

unsure folder that look similar, try training on only one of them and deleting 

the rest to avoid training on too many spams.</SPAN></FONT></DIV>

<DIV><FONT face=Verdana color=#0000ff size=2></FONT>&nbsp;</DIV>

<DIV align=left><FONT face=Verdana size=2>-- </FONT></DIV>

<DIV align=left><FONT face=Verdana size=2>Kenny Pitt</FONT></DIV>

<DIV><FONT face=Verdana color=#0000ff size=2></FONT>&nbsp;</DIV><FONT 

face=Verdana size=2></FONT><BR>

<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

<HR tabIndex=-1>

<FONT face=Tahoma size=2><B>From:</B> spambayes-dev-bounces@python.org 

[mailto:spambayes-dev-bounces@python.org] <B>On Behalf Of </B>G. Waleed 

Kavalec<BR><B>Sent:</B> Friday, July 02, 2004 1:51 PM<BR><B>To:</B> 

spambayes-dev@python.org<BR><B>Subject:</B> [spambayes-dev] Spam Clues: 

&lt;&gt;&lt; STOP! Looking for anti christianchristians<BR></FONT><BR></DIV>

<DIV></DIV>

<DIV class=Section1>

<H2><B><FONT face="Times New Roman" color=blue size=2><SPAN 

style="FONT-SIZE: 10pt; COLOR: blue">This thing won&#8217;t 

die.<o:p></o:p></SPAN></FONT></B></H2>

<H2><B><FONT face="Times New Roman" color=blue size=2><SPAN 

style="FONT-SIZE: 10pt; COLOR: blue">It doesn&#8217;t even go to 

&#8216;maybe&#8217;.<o:p></o:p></SPAN></FONT></B></H2>

<H2><B><FONT face="Times New Roman" color=blue size=2><SPAN 

style="FONT-SIZE: 10pt; COLOR: blue">&#8220;What&#8217;s up with 

that?&#8221;<o:p></o:p></SPAN></FONT></B></H2>

<H2><B><FONT face="Times New Roman" color=green size=5><SPAN 

style="FONT-SIZE: 18pt"><o:p>&nbsp;</o:p></SPAN></FONT></B></H2>

<H2><B><FONT face="Times New Roman" color=green size=5><SPAN 

style="FONT-SIZE: 18pt">Combined Score: 0% 

(3.16545e-005)<o:p></o:p></SPAN></FONT></B></H2>

<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN 

style="FONT-SIZE: 12pt">Internal ham score (</SPAN></FONT><TT><FONT 

face="Courier New" size=2><SPAN style="FONT-SIZE: 10pt">*H*</SPAN></FONT></TT>): 

1<BR>Internal spam score (<TT><FONT face="Courier New" size=2><SPAN 

style="FONT-SIZE: 10pt">*S*</SPAN></FONT></TT>): 6.3309e-005<BR><BR># ham 

trained on: 140<BR># spam trained on: 6168<o:p></o:p></P>

<H2><B><FONT face="Times New Roman" color=green size=5><SPAN 

style="FONT-SIZE: 18pt">150 Significant Tokens<o:p></o:p></SPAN></FONT></B></H2><PRE><STRONG><B><FONT face="Courier New" size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'">token&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; spamprob&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #ham&nbsp; #spam<o:p></o:p></SPAN></FONT></B></STRONG></PRE><PRE><FONT face="Courier New" size=2><SPAN style="FONT-SIZE: 10pt">'religions'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.027636&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5<o:p></o:p></SPAN></FONT></PRE><PRE><FONT face="Courier New" size=2><SPAN style="FONT-SIZE: 10pt">'christianity'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.0281306&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7<o:p></o:p></SPAN></FONT></PRE><PRE><FONT face="Courier New" size=2><SPAN style="FONT-SIZE: 10pt">'jesus,'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.0281306&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7<o:p></o:p></SPAN></FONT></PRE><PRE><FONT face="Courier New" size=2><SPAN style="FONT-SIZE: 10pt">'religion,'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.0282139&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 12&nbsp;&nbsp;&nbsp;&nbsp; 10<o:p></o:p></SPAN></FONT></PRE><PRE><CODE><FONT face="Courier New" size=2><SPAN style="FONT-SIZE: 10pt"></SPAN></FONT></CODE><o:p></o:p>&nbsp;</PRE></DIV></BODY></HTML>