[spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayesmessage.py, 1.39, 1.40 storage.py, 1.35, 1.36

Mark Hammond mhammond at skippinet.com.au
Wed Oct 8 01:07:49 EDT 2003


[Tim]
> After the last round of worm spew subsided, I switched all my
> (3) Outlook
> classifiers back to using bsddb3 and (now) Python 2.3.2.  I
> still haven't
> seen any database corruption, and I get about 800 emails each
> day now.  In
> addition, I've been doing a lot of "real development work" on
> Win98SE lately
> (well, *trying* to do real work <wink>), and Outlook
> typically suffers from
> forced OS reboots several times each day.  That is, we're not only not
> shutting down cleanly then, we're not getting to do *any*
> shutdown cleanup
> then.  Sometimes Outlook 2K itself takes 10 minutes to come
> up again after a
> reboot (that's got nothing to do with spambayes, btw -- it's
> always behaved
> this way), but the Berkeley db never complains.

FWIW, my experience is similar.  I use Win2k so never lose the OS itself,
but *regularly* kick outlook in the nuts via the "Task Manager".  I've never
seen this error, or any other bsd related error either.

I haven't followed this closely enough, but is it possible it depends on the
specific sleepycat version (and therefore indirectly on the specific Python
version or source)?

> C:\WINDOWS\Application Data\SpamBayes>dir *.db
>
> DEFAUL~1 DB      2,621,440  10-07-03 10:31p default_bayes_database.db
> DEFAUL~2 DB         98,304  10-07-03 10:31p
> default_message_database.db
>          2 file(s)      2,719,744 bytes

Mine are almost exactly twice that size, so still in the same league.
However, as Tony says, this is *mainly* for non-Outlook users, so I expect
their training patterns to be different.  Eg, I believe we now can train on
Outlook Express files - but presumably this training will process *every*
message, rather than single folders.  I don't know enough about the proxy,
but I suspect you may be on the right track that the "average" db size for
Outlook users is radically different to other users.

Digging a little more, bug
https://sourceforge.net/tracker/index.php?func=detail&aid=807217&group_id=61
702&atid=498103 is nice enough to have a log from an Outlook session with
this error - of note:

Bayes database initialized with 297 spam and 21252 good messages
*** - message database has 21269 messages - bayes has 21549 - something is
screwey

Implying a large database is being used in that case at least.

Mark.




More information about the spambayes-dev mailing list