[Spambayes] Corrupt database

Tony Meyer tameyer at ihug.co.nz
Fri Jan 30 22:20:24 EST 2004


> Thanks for your reply! I didn't originally realise this was an FAQ. 
> Sorry for that.

No worries.

> I've switched to using a pickle in the mean time, and I must 
> say that I do not really notice any changes in speed. Is there
> a good reason why the pickle isn't the default? That would seem more 
> user-friendly to me.

Well, bsddb should be the better option for the sort of db use that
SpamBayes needs.  I believe that speed changes could be quite noticeable in
certain situations (slower machines, larger databases, and so on).  OTOH,
that is one of the options that the developers has discussed.  We'd really
like to get this solved, though :)

> Is it possible to add more logging to the database access?

Not as far as I am aware.  The main problem is that we're really not all
that sure where the problem is occurring.  It doesn't really help that none
of the developers (IIRC) are bsddb experts :)

> Is there any way I could help in debugging this? I'm not a python 
> programmr, but have ample experience in perl and java.

If there's any way that you are able to reproduce the problem, then knowing
what it was would be extremely helpful (if we know what causes it, it should
be simple to solve).  Otherwise, just suggestions of possible causes could
be helpful (for example, we suspect that one cause might be that users close
the browser window while training, so the next release will handle that
better).

> And I would
> like to do something to contribute to this great project :)

A phrase we love to see :)  If it's not with this, there are lots of other
ways to help, including ones that don't need Python programming (although if
you know perl & java then the python code should be pretty simple to get).
The FAQ and wiki (the wiki is more up-to-date, I think) have stuff about how
people can help :)

> I'll give this a try. Would it be of interest for you to know 
> which one was corrupted?

Yes, it would.  The message-info one has typically been the most
troublesome, and has also had the most improvements made to it (although,
IIRC, these have all been since the last release...).    It's not always
that one, though.

> It has been an uncommon experience with this version (1.0a7), 
> although it happened quite often with an earlier version (1.0a5 I think), 
> especially when reviewing classified messages. I'm not sure, 
> but I think you guys fixed a lot of the issues already.

Yes, there definitely have been quite a few fixes already, plus there are
definitely more that will appear in the next release.

> I wouldn't mind investing some time into this issue (since 
> it's an faq, more people might benefit from it), but could you
> give me some information how to get started?
> As it stands I have no clue as to why it happened.

Nor do we, which is the problem, really.  Most of our suspicions have been
that changes are being made to the database, and then SpamBayes is quitting
without saving the database.  We *should* be saving after all changes (now)
though, so that shouldn't be a problem anymore.

Sorry I can't give more direction here - I'd certainly love to be able to
point you at a list of things to try to break it!  Some of the discussion in
the spambayes-dev list might be of interest (and might include things I've
forgotten).  If you google for '"spambayes-dev" run recovery' or something
like that you should find most of the stuff.

Apologies for the delay in replying; 'real work' has been busy recently :)

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list