I seem to have stumbled upon a (persistent) spamcount > nspam bug
ls, after getting these messages: -------------------------------8<--------------------- fetchmail: reading message <mailserver>:19 of 19 (4326 octets) ....Traceback (most recent call last): File "/usr/bin/sb_filter.py", line 257, in ? main() File "/usr/bin/sb_filter.py", line 248, in main action(msg) File "/usr/bin/sb_filter.py", line 180, in filter return self.h.filter(msg) File "/usr/lib/.../spambayes/hammie.py", line 109, in filter prob, clues = self._scoremsg(msg, True) File "/usr/lib/.../spambayes/hammie.py", line 38, in _scoremsg return self.bayes.spamprob(tokenize(msg), evidence) File "/usr/lib/.../spambayes/classifier.py", line 190, in chi2_spamprob clues = self._getclues(wordstream) File "/usr/lib/.../spambayes/classifier.py", line 493, in _getclues tup = self._worddistanceget(word) File "/usr/lib/.../spambayes/classifier.py", line 508, in _worddistanceget prob = self.probability(record) File "/usr/lib/.../spambayes/classifier.py", line 311, in probability assert spamcount <= nspam AssertionError -------------------------------8<--------------------- NB ... = python2.4/site-packages And after browsing (and googling) the internet, the above error seems to point to a corrupt DB file. Even after recreating this DB file from scratch *and* after updating to spambayes-1.1a1 and recreating the same DB file, this problem keeps existing. The platform spambayes is running on is (Intel pentium) Fedora Core 5. I've used the following comnmand to recreate the DB file: -------------------------------8<--------------------- $ # Empty hammie DB $ echo -n > /usr/local/share/spambayes/hammie.db $ # Retrain $ sb_mboxtrain.py -f -d /usr/local/share/spambayes/hammie.db -g \ $HOME/EMail/Inbox/inbox -g $HOME/EMail/Inbox/NeoNixie -g \ $HOME/EMail/Inbox/SpareTimeGizmo -g $HOME/EMail/Inbox/Evolution -s \ $HOME/EMail/spam/spam -------------------------------8<--------------------- NB I've specified the -f option to force a retrain of already trained messages. Can anyone of you shed some light on this issue? MTIA, cu l8r, Edgar. -- \|||/ (o o) Just curious... ----ooO-(_)-Ooo-------------------------------------------------------
Edgar> File "/usr/lib/.../spambayes/classifier.py", line 311, in probability Edgar> assert spamcount <= nspam Edgar> AssertionError Edgar, Database corruption generally rears its ugly head when you have two tools trying to manipulate the database at once. You use mboxtrain to create it. What other tools in the SpamBayes set do you use? What database format are you using? I'm guessing you have yet to switch to zodb. It seems to offer much better resilience against these sorts of problems. Skip
Hi Skip, On 09/11/2006 04:24:44 PM, skip@pobox.com wrote:
Edgar> File "/usr/lib/.../spambayes/classifier.py", line 311, in probability Edgar> assert spamcount <= nspam Edgar> AssertionError
Edgar,
Database corruption generally rears its ugly head when you have two tools trying to manipulate the database at once. You use mboxtrain to create it.
I use sb_mboxtrain.py to train the system. And use sb_filter,py to filter/score the messages.
What other tools in the SpamBayes set do you use?
Except from ones mentioned above, none. I've tried to use sb_dbimpexp.py. Using 1.0rc2.2, it crashes. Using 1.1a1, it returns a CSV file.
What database format are you using?
The default DB format: Berkeley DB.
I'm guessing you have yet to switch to zodb. It seems to offer much better resilience against these sorts of problems.
How can switch? Thanks, Edgar. -- \|||/ (o o) Just curious... ----ooO-(_)-Ooo-------------------------------------------------------
Edgar> I use sb_mboxtrain.py to train the system. And use sb_filter,py to Edgar> filter/score the messages. Is it possible that sb_filter.py and sb_mboxtrain.py run at the same time? If so, that would be enough to cause problems, especially using Berkeley DB. >> I'm guessing you have yet to switch to zodb. It seems to offer much >> better resilience against these sorts of problems. Edgar> How can switch? Download and install ZODB, change your config file to refer to it: [Storage] persistent_use_database: zodb ... then retrain from scratch. I don't recall what version you're running, but zodb is the default starting with 1.1a2. Depending how much older the version you're running is, you may or may not find support for ZODB in the SpamBayes tool set. I recommend you update to 1.1a3. Skip
Hi Skip, On 09/11/2006 09:42:24 PM, skip@pobox.com wrote:
Edgar> I use sb_mboxtrain.py to train the system. And use sb_filter,py to Edgar> filter/score the messages.
Is it possible that sb_filter.py and sb_mboxtrain.py run at the same time? If so, that would be enough to cause problems, especially using Berkeley DB.
No they (almost) never run at the same time. sb_filter.py runs every hour on the hour. And sb_mboxtrain.py runs once a day at 02:15. And I've never seen that sb_filter.py runs for more than a minute.
>> I'm guessing you have yet to switch to zodb. It seems to offer much >> better resilience against these sorts of problems.
Edgar> How can switch?
Download and install ZODB, change your config file to refer to it:
[Storage] persistent_use_database: zodb ...
OK, will do this.
then retrain from scratch.
I don't recall what version you're running, but zodb is the default starting with 1.1a2. Depending how much older the version you're
I'm currently running 1.1a1.
running is, you may or may not find support for ZODB in the SpamBayes tool set. I recommend you update to 1.1a3.
OK, will do this also. Thanks, cu l8r, Edgar. -- \|||/ (o o) Just curious... ----ooO-(_)-Ooo-------------------------------------------------------
participants (2)
-
Edgar Matzinger -
skip@pobox.com