<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v = 

"urn:schemas-microsoft-com:vml" xmlns:o = 

"urn:schemas-microsoft-com:office:office" xmlns:w = 

"urn:schemas-microsoft-com:office:word"><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.5335.5" name=GENERATOR>

<STYLE>@font-face {

        font-family: Arial Unicode MS;

}

@font-face {

        font-family: @Arial Unicode MS;

}

@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; }

P.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"

}

LI.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"

}

DIV.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"

}

H2 {

        FONT-SIZE: 18pt; MARGIN-LEFT: 0in; MARGIN-RIGHT: 0in; FONT-FAMILY: "Arial Unicode MS"; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto

}

A:link {

        COLOR: blue; TEXT-DECORATION: underline

}

SPAN.MsoHyperlink {

        COLOR: blue; TEXT-DECORATION: underline

}

A:visited {

        COLOR: purple; TEXT-DECORATION: underline

}

SPAN.MsoHyperlinkFollowed {

        COLOR: purple; TEXT-DECORATION: underline

}

SPAN.EmailStyle17 {

        COLOR: windowtext; FONT-FAMILY: Arial; mso-style-type: personal

}

SPAN.EmailStyle18 {

        COLOR: navy; FONT-FAMILY: Arial; mso-style-type: personal

}

SPAN.EmailStyle19 {

        COLOR: navy; FONT-FAMILY: Arial; mso-style-type: personal-reply

}

DIV.Section1 {

        page: Section1

}

</STYLE>

</HEAD>

<BODY lang=EN-US vLink=purple link=blue>

<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial 

color=#0000ff size=2>This is one for the training gurus. You can find a 

discussion of various training approaches on the SpamBayes wiki (<A 

href="http://www.entrian.com/sbwiki/TrainingIdeas">http://www.entrian.com/sbwiki/TrainingIdeas</A>).</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial 

color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial 

color=#0000ff size=2>That said, I'll put my oar in. In general, the 

recommendation of the gurus is along the lines of "don't worry, be happy:" as 

long as you're getting satisfactory results, just use the training buttons to 

correct classification errors. The bottom line is the quality of the results 

you're getting; the suggestion to keep the ham:spam ratio close to 1 is a 

guideline that seems to help achieve that result. I follow that approach, and 

when I notice that I'm getting unsatisfactory results over a period of time, I 

just discard my training database and start over. SpamBayes learns very quickly, 

so I don't find it worthwhile to try to tune the database over 

time.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial 

color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial 

color=#0000ff size=2>Another thing to look at is the threshold scores for 

possible and certain spam. I've dropped my certain spam threshold somewhat as 

I've become more confident in my training data (it's now .70). This means fewer 

possible spam messages that I then train as spam, which reduces the ham:spam 

imbalance. I'm currently getting good results (&gt;95% correctly classified) 

with 53 ham and 171 spam trained on.</FONT></SPAN></DIV><BR>

<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

<HR tabIndex=-1>

<FONT face=Tahoma size=2><B>From:</B> spambayes-bounces@python.org 

[mailto:spambayes-bounces@python.org] <B>On Behalf Of </B>Gil 

Hurlbut<BR><B>Sent:</B> Monday, April 24, 2006 4:35 PM<BR><B>To:</B> 

spambayes@python.org<BR><B>Subject:</B> Re: [Spambayes] Incremental Training for 

ham in Outlook Plugin?<BR></FONT><BR></DIV>

<DIV></DIV>

<DIV class=Section1>

<P class=MsoNormal><FONT face=Arial size=2><SPAN 

style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">The question addresses the fact that 

SpamBayes is far better at classifying ham once it is trained than it is in 

keeping up with classifying new spam. I find it necessary to remove many spam 

messages until I get to the point where the Manager has far more spam than ham. 

Until I hear a recommendation differently, I&#8217;m going to get back to a balance by 

moving known ham to my Unsure folder and click on </SPAN></FONT><FONT face=Arial 

size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">&#8220;</SPAN></FONT><FONT 

face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Recover from 

Spam&#8221; to do the incremental training.</SPAN></FONT><FONT face=Arial size=2><SPAN 

style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> </SPAN></FONT><FONT face=Arial 

size=2><SPAN 

style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p></o:p></SPAN></FONT></P>

<P class=MsoNormal><FONT face=Arial size=2><SPAN 

style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p>&nbsp;</o:p></SPAN></FONT></P></DIV></BODY></HTML>