[Spambayes] Non filtering proxy

Tony Meyer tameyer at ihug.co.nz
Tue Oct 12 03:30:22 CEST 2004


[Lee]
>> Right, i've checked the headers and i'm getting an exception : [...]
>> File "spambayes\classifier.pyc", line 308, in probability 
>> .AssertionError

[Richie]
> Ah.  The good news is that that tells us what's wrong.  The 
> bad news is that your training database has become corrupted. 
> You need to stop SpamBayes, delete the file, which is called 
> something like "C:\Documents and Settings\Lee\Application 
> Data\SpamBayes\Proxy\statistics_database.db", restart SpamBayes
> and retrain it from scratch.
>
> We don't know what causes such corruption I'm afraid.  Before 
> you delete your database, someone else might come along with 
> more of an idea of how you might be able to recover it, but I 
> don't know of a way.

It appears that this is the hamcount>total_ham error, not the DB_RUNRECOVERY
one, from which there is no recovery.

You can, if you really want to, fix the database so that it is again usable.
You'll need to:

 1. Install Python: http://www.python.org/download
 2. Get the SpamBayes 1.0 source.
 3. Use the sb_dbexpimp.py script to convert the database to CSV.
 4. Open that in (eg) Excel, and correct the entries at the top, which
record the total number of ham and spam seen, to be more than any counts in
the spam/ham columns for each token.
 5. Convert the fixed db back to bsddb format with the sb_dbexpimp.py
script.

However, if you have this problem, then the database may have other
problems, and fiddling about with the numbers manually like this isn't
recommended.  Given that it takes hardly any messages to get good results
(in general), retraining (as Richie suggested) is probably the best idea.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.



More information about the Spambayes mailing list