[spambayes-dev] Re: Pickle vs DB inconsistencies
Tim Peters
tim.one at comcast.net
Tue Jun 24 23:22:53 EDT 2003
[Tony Meyer]
> Ok, I think I have this figured out now.
>
> The DBDictClassifier currently tries to be efficient by not storing
> "singleton" words (i.e. words that have only appeared once) in the
> wordinfo cache, but saving them directly to the database. This is all
> fine, except that they are *not* saved to the database until store()
> is called. This means that between a call to _wordinfoset() and a
> call to store() the counts are unreliable.
You're saying that if d is an open Shelf object, then after
d[string] = whatever
the value of the access expression
d[string]
is unreliable unless a
d.sync()
call intervenes? That's scary, if so -- or a bug. The "whatever" thingies
we're storing are not mutable objects (they're immutable tuples), so the
caution about *mutating* Shelf values in a Shelf opened with the default
writeback=False doesn't apply in spambayes.
> To get around this, we need to either sync the db in the _wordinfoset
> function (seems to be expensive), or cache the words after all, or
> something else.
>
> Anyway, this is how it seems to me - I could be wrong! If Mark or
> someone more familiar with this stuff could look at it, that would be
> great.
I haven't used this part of the code in real life. Other questions that pop
up:
+ Why does _wordinfoset() start with
if record and ...
? For example, how could record==None possibly arise?
+ If a word is deleted, what's stopping _wordinfoget() from sucking it
out of the database anyway? That is, I believe the except clause in
_wordinfoget() should start with:
if self.changed_words.get(word) is WORD_DELETED:
return None
More information about the spambayes-dev
mailing list