<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
"urn:schemas-microsoft-com:vml" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.5335.5" name=GENERATOR>
<STYLE>@font-face {
        font-family: Arial Unicode MS;
}
@font-face {
        font-family: @Arial Unicode MS;
}
@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; }
P.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
LI.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
DIV.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
H2 {
        FONT-SIZE: 18pt; MARGIN-LEFT: 0in; MARGIN-RIGHT: 0in; FONT-FAMILY: "Arial Unicode MS"; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto
}
A:link {
        COLOR: blue; TEXT-DECORATION: underline
}
SPAN.MsoHyperlink {
        COLOR: blue; TEXT-DECORATION: underline
}
A:visited {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.MsoHyperlinkFollowed {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.EmailStyle17 {
        COLOR: windowtext; FONT-FAMILY: Arial; mso-style-type: personal
}
SPAN.EmailStyle18 {
        COLOR: navy; FONT-FAMILY: Arial; mso-style-type: personal
}
SPAN.EmailStyle19 {
        COLOR: navy; FONT-FAMILY: Arial; mso-style-type: personal-reply
}
DIV.Section1 {
        page: Section1
}
</STYLE>
</HEAD>
<BODY lang=EN-US vLink=purple link=blue>
<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial
color=#0000ff size=2>This is one for the training gurus. You can find a
discussion of various training approaches on the SpamBayes wiki (<A
href="http://www.entrian.com/sbwiki/TrainingIdeas">http://www.entrian.com/sbwiki/TrainingIdeas</A>).</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial
color=#0000ff size=2>That said, I'll put my oar in. In general, the
recommendation of the gurus is along the lines of "don't worry, be happy:" as
long as you're getting satisfactory results, just use the training buttons to
correct classification errors. The bottom line is the quality of the results
you're getting; the suggestion to keep the ham:spam ratio close to 1 is a
guideline that seems to help achieve that result. I follow that approach, and
when I notice that I'm getting unsatisfactory results over a period of time, I
just discard my training database and start over. SpamBayes learns very quickly,
so I don't find it worthwhile to try to tune the database over
time.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=421270813-25042006><FONT face=Arial
color=#0000ff size=2>Another thing to look at is the threshold scores for
possible and certain spam. I've dropped my certain spam threshold somewhat as
I've become more confident in my training data (it's now .70). This means fewer
possible spam messages that I then train as spam, which reduces the ham:spam
imbalance. I'm currently getting good results (>95% correctly classified)
with 53 ham and 171 spam trained on.</FONT></SPAN></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> spambayes-bounces@python.org
[mailto:spambayes-bounces@python.org] <B>On Behalf Of </B>Gil
Hurlbut<BR><B>Sent:</B> Monday, April 24, 2006 4:35 PM<BR><B>To:</B>
spambayes@python.org<BR><B>Subject:</B> Re: [Spambayes] Incremental Training for
ham in Outlook Plugin?<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=Section1>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">The question addresses the fact that
SpamBayes is far better at classifying ham once it is trained than it is in
keeping up with classifying new spam. I find it necessary to remove many spam
messages until I get to the point where the Manager has far more spam than ham.
Until I hear a recommendation differently, I’m going to get back to a balance by
moving known ham to my Unsure folder and click on </SPAN></FONT><FONT face=Arial
size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">“</SPAN></FONT><FONT
face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Recover from
Spam” to do the incremental training.</SPAN></FONT><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> </SPAN></FONT><FONT face=Arial
size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P></DIV></BODY></HTML>