[spambayes-dev] pgsql

Mike Miskulin mmm at projectumbra.org
Fri Mar 31 01:59:36 CEST 2006


About a month ago I posted on the regular list in regards to trying to
get spambayes to work with mysql and postgresql.   I can report
that things are mostly, but not entirely well.   I have been working
on a win XP home system.  First, I'd like to suggest the following or
similar be added to classifier.py around abouts
line 320:

        if ((hamratio + spamratio) > 0) :
           prob = spamratio / (hamratio + spamratio)
           prob = 0.5

I had a number of instances after resetting the database that
I bombed on div by 0 errors. 

I have attached a few pdf's capturing a few errors that seem
to be typical while using postgresql.  (I only used mysql long
enough to determine it functioned so can't say for sure if this
would occur there too).  I've not been able to track this error
down,  its a bit beyond me.  As you can see it has happened
immediately after training a group of messages as well in the
course of normal operation.  This error how ever does not
appear to be 'fatal'.  A simple stop/restart of the spambayes service
and things run again.

A more serious error which is temporarily fatal occurs when
a situation of token in more spam(ham) than spam(ham) trained.
Again, I've not tracked down the why of it happening.  But I
have noticed a few things:

1- the keyword that it most offen happens to is 'Subject'
2- if it is multiple keywords, then it has happened after training
    on messages.
3- bouncing a message to the service (ie spambayes_ham at localhost)
    that has not been previously seen (ie, to train quicker instead of
    waiting for new messages to arrive) seems to screw things up. 
   Maybe not after the first message, but not long after.   Note I have
   also seen situations where the service stops accepting messages.

Now while this is annoying to have happen, unlike with the 'normal'
way of using spambayes, one can easily go into the database and
'fix' the problem - at least to the extent you no longer have a nspam >
 spam trained situation.  While adjustments to the database may be
 arbitrary if you are off by more than 1, if you have a large number
 of trained messages it shouldn't have a material effect.  And at least
one need not start completely over.

Well I hope this all is of some use to you guys.  If you have any
questions please email me directly as I did not subscribe to the
devel list.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: sb_error.pdf
Type: application/pdf
Size: 19892 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20060330/958b207c/attachment-0003.pdf 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sb_error2.pdf
Type: application/pdf
Size: 17042 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20060330/958b207c/attachment-0004.pdf 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sb_error3.pdf
Type: application/pdf
Size: 20272 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20060330/958b207c/attachment-0005.pdf 

More information about the spambayes-dev mailing list