[spambayes-dev] Re: Pickle vs DB inconsistencies

Tim Peters tim.one at comcast.net
Tue Jun 24 23:22:53 EDT 2003


[Tony Meyer]
> Ok, I think I have this figured out now.
>
> The DBDictClassifier currently tries to be efficient by not storing
> "singleton" words (i.e. words that have only appeared once) in the
> wordinfo cache, but saving them directly to the database.  This is all
> fine, except that they are *not* saved to the database until store()
> is called.  This means that between a call to _wordinfoset() and a
> call to store() the counts are unreliable.

You're saying that if d is an open Shelf object, then after

    d[string] = whatever

the value of the access expression

    d[string]

is unreliable unless a

    d.sync()

call intervenes?  That's scary, if so -- or a bug.  The "whatever" thingies
we're storing are not mutable objects (they're immutable tuples), so the
caution about *mutating* Shelf values in a Shelf opened with the default
writeback=False doesn't apply in spambayes.

> To get around this, we need to either sync the db in the _wordinfoset
> function (seems to be expensive), or cache the words after all, or
> something else.
>
> Anyway, this is how it seems to me - I could be wrong!  If Mark or
> someone more familiar with this stuff could look at it, that would be
> great.

I haven't used this part of the code in real life.  Other questions that pop
up:

+ Why does _wordinfoset() start with

    if record and ...

  ?  For example, how could record==None possibly arise?

+ If a word is deleted, what's stopping _wordinfoget() from sucking it
  out of the database anyway?  That is, I believe the except clause in
  _wordinfoget() should start with:

      if self.changed_words.get(word) is WORD_DELETED:
          return None




More information about the spambayes-dev mailing list