<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2900.2523" name=GENERATOR></HEAD>

<BODY>

<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana 

color=#0000ff size=2>Your problem almost certainly lies 

here:</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana 

color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana 

color=#0000ff size=2><FONT face="Times New Roman" color=#000000 size=3># ham 

trained on: 23319<BR># spam trained on: 370</FONT><BR></FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana 

color=#0000ff size=2>Based on the imbalance in the number of messages that you 

have trained, a single spam token will have approximately 63 times as much 

influence on the overall score as a single ham token.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana 

color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV>

<DIV dir=ltr align=left><SPAN class=078245619-13102004><FONT face=Verdana 

color=#0000ff size=2>For best results, you should train on roughly equal numbers 

of spam and ham messages.&nbsp; 5x to 10x is probably OK for most people, but 

63x is definately pushing the limits.&nbsp; Your best bet is probably to delete 

your training database and start over from scratch.&nbsp; If you train only by 

using the toolbar buttons when messages are misclassified instead of by training 

a bunch of existing messages up front then you'll probably get better 

results.</FONT></SPAN></DIV>

<DIV><FONT face=Verdana color=#0000ff size=2></FONT>&nbsp;</DIV>

<DIV align=left><FONT face=Verdana size=2>-- </FONT></DIV>

<DIV align=left><FONT face=Verdana size=2>Kenny Pitt</FONT></DIV>

<DIV><FONT face=Verdana color=#0000ff size=2></FONT>&nbsp;</DIV><FONT 

face=Verdana size=2></FONT><FONT face=Verdana size=2></FONT><FONT face=Verdana 

size=2></FONT><FONT face=Verdana size=2></FONT><BR>

<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

<HR tabIndex=-1>

<FONT face=Tahoma size=2><B>From:</B> spambayes-bounces@python.org 

[mailto:spambayes-bounces@python.org] <B>On Behalf Of </B>Mark 

Vovchuk<BR><B>Sent:</B> Wednesday, October 13, 2004 3:18 PM<BR><B>To:</B> 

spambayes@python.org<BR><B>Subject:</B> [Spambayes] Many users on domain coming 

up as "possibly spam"<BR></FONT><BR></DIV>

<DIV></DIV>

<DIV><SPAN class=884531519-13102004><FONT face=Arial size=2>Including 

myself.&nbsp; Many people in my organization are coming up as either spam or 

maybe spam.&nbsp; I have been trying out spambayes as a way to get off of 

another product and this is the last hurdle that I cannot overcome.&nbsp; I have 

them keep moving each other, and myself, out using the "recover" button but to 

no avail.&nbsp; this is one of the clues messages that someone had on an email I 

sent:</FONT></SPAN></DIV>

<DIV><SPAN class=884531519-13102004><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV><SPAN class=884531519-13102004>

<H2>Combined Score: 69% (0.686078)</H2>

<DIV>Internal ham score (<TT>*H*</TT>): 0.229281<BR>Internal spam score 

(<TT>*S*</TT>): 0.601437<BR><BR># ham trained on: 23319<BR># spam trained on: 

370<BR></DIV>

<H2>17 Significant Tokens</H2><PRE><STRONG>token                               spamprob         #ham  #spam

</STRONG>'subject:odd'                       0.155172            1      0

'url:105957'                        0.155172            1      0

'url:indymedia'                     0.155172            1      0

'url:sandiego'                      0.155172            1      0

'from:none'                         0.3267           1559     12

'to:addr:rob'                       0.334402          753      6

'message-id:invalid'                0.37662          1565     15

'reply-to:none'                     0.397052        22874    239

'header:To:1'                       0.608344        14607    360

'url:shtml'                         0.694677           55      2

'url:org'                           0.709459          619     24

'to:2**0'                           0.744606         7133    330

'to:no real name:2**0'              0.804451         3722    243

'proto:http'                        0.825724         3963    298

'url:10'                            0.850336           21      2

'url:2004'                          0.858892            9      1

'url:en'                            0.963873            2      5

</PRE>

<H2><SPAN class=884531519-13102004><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</H2></SPAN></BODY></HTML>