<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2523" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana
color=#0000ff size=2>Your problem almost certainly lies
here:</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana
color=#0000ff size=2><FONT face="Times New Roman" color=#000000 size=3># ham
trained on: 23319<BR># spam trained on: 370</FONT><BR></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana
color=#0000ff size=2>Based on the imbalance in the number of messages that you
have trained, a single spam token will have approximately 63 times as much
influence on the overall score as a single ham token.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana
color=#0000ff size=2>For best results, you should train on roughly equal numbers
of spam and ham messages. 5x to 10x is probably OK for most people, but
63x is definately pushing the limits. Your best bet is probably to delete
your training database and start over from scratch. If you train only by
using the toolbar buttons when messages are misclassified instead of by training
a bunch of existing messages up front then you'll probably get better
results.</FONT></SPAN></DIV>
<DIV><FONT face=Verdana color=#0000ff size=2></FONT> </DIV>
<DIV align=left><FONT face=Verdana size=2>-- </FONT></DIV>
<DIV align=left><FONT face=Verdana size=2>Kenny Pitt</FONT></DIV>
<DIV><FONT face=Verdana color=#0000ff size=2></FONT> </DIV><FONT
face=Verdana size=2></FONT><FONT face=Verdana size=2></FONT><FONT face=Verdana
size=2></FONT><FONT face=Verdana size=2></FONT><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> spambayes-bounces@python.org
[mailto:spambayes-bounces@python.org] <B>On Behalf Of </B>Mark
Vovchuk<BR><B>Sent:</B> Wednesday, October 13, 2004 3:18 PM<BR><B>To:</B>
spambayes@python.org<BR><B>Subject:</B> [Spambayes] Many users on domain coming
up as "possibly spam"<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=884531519-13102004><FONT face=Arial size=2>Including
myself. Many people in my organization are coming up as either spam or
maybe spam. I have been trying out spambayes as a way to get off of
another product and this is the last hurdle that I cannot overcome. I have
them keep moving each other, and myself, out using the "recover" button but to
no avail. this is one of the clues messages that someone had on an email I
sent:</FONT></SPAN></DIV>
<DIV><SPAN class=884531519-13102004><FONT face=Arial
size=2></FONT></SPAN> </DIV><SPAN class=884531519-13102004>
<H2>Combined Score: 69% (0.686078)</H2>
<DIV>Internal ham score (<TT>*H*</TT>): 0.229281<BR>Internal spam score
(<TT>*S*</TT>): 0.601437<BR><BR># ham trained on: 23319<BR># spam trained on:
370<BR></DIV>
<H2>17 Significant Tokens</H2><PRE><STRONG>token spamprob #ham #spam
</STRONG>'subject:odd' 0.155172 1 0
'url:105957' 0.155172 1 0
'url:indymedia' 0.155172 1 0
'url:sandiego' 0.155172 1 0
'from:none' 0.3267 1559 12
'to:addr:rob' 0.334402 753 6
'message-id:invalid' 0.37662 1565 15
'reply-to:none' 0.397052 22874 239
'header:To:1' 0.608344 14607 360
'url:shtml' 0.694677 55 2
'url:org' 0.709459 619 24
'to:2**0' 0.744606 7133 330
'to:no real name:2**0' 0.804451 3722 243
'proto:http' 0.825724 3963 298
'url:10' 0.850336 21 2
'url:2004' 0.858892 9 1
'url:en' 0.963873 2 5
</PRE>
<H2><SPAN class=884531519-13102004><FONT face=Arial
size=2></FONT></SPAN> </H2></SPAN></BODY></HTML>