[spambayes-dev] A new and altogether different bsddb breakage

Tim Peters tim.one at comcast.net
Mon Dec 15 11:00:13 EST 2003


[Richie Hindle]
> I think we're using different versions of bsddb - your code fails for
> me:
>
> >>> d = bsddb.hashopen("/src/tests/spambayes/hammie.db")
> >>> len(d)
> 52331
> >>> len([k for k in d if d.get(k, None) is None])
> Traceback (most recent call last):
>   File "<pyshell#4>", line 1, in -toplevel-
>     len([k for k in d if d.get(k, None) is None])
>   File "C:\Python23\lib\bsddb\__init__.py", line 86, in __getitem__
>     return self.db[key]
> TypeError: Integer keys only allowed for Recno and Queue DB's
>
> I think this is because GET_ITER is creating a list-style iterator
> rather than a dict-style one.  bsddb objects don't look much like
> dictionaries:
>
> >>> len([k for k in d.keys() if d.get(k, None) is None])
> Traceback (most recent call last):
>   File "<pyshell#11>", line 1, in -toplevel-
>     len([k for k in d.keys() if d.get(k, None) is None])
> AttributeError: _DBWithCursor instance has no attribute 'get'

Not here:

>>> PATH = "/WINDOWS/Application Data/SpamBayes/default_bayes_database.db"
>>> import bsddb
>>> d = bsddb.hashopen(PATH, 'r')
>>> len([k for k in d.keys() if d.get(k, None) is None])
0
>>>

> I have Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit
> (Intel)] on win32.  Assuming that's a red herring,

I wouldn't assume that -- it may be the whole ball of wax.  I'm using
exactly the same, *except* I'm using 2.3.3c1 (also on Windows), and a number
of bsddb3 fixes have been checked in since Python 2.3.  It would help if you
tried 2.3.3c1.  If your symptoms above persist, then we've got a Major
Mystery to sort out (e.g., maybe you-- or I --aren't getting the version of
bsddb the Windows installer intended us to get).


> here's an equivalent that works for me:
>
> >>> def get(d, k, default):
>         try:
>                 return d[k]
>         except KeyError:
>                 return default
>
> >>> len([k for k in d.keys() if get(d, k, None) is None]) 305
>
> So yes, the underlying database is screwed.  But one token less
> screwed than last time - lovely.  (I now get 305 when going through
> shelve as well.)  I've done some training in between, which must have
> jiggled things around.

...

> I'm certainly underwhelmed by bsddb in single-file mode.  One day I
> want to make spambayes use full transaction mode - that really ought
> to work. (Does anyone know of any simple Python code I can steal that
> uses bsddb in full-on multi-everything DBEnv mode?  The pybsddb docs
> just link to the SleepyCat C API docs, which aren't very
> approachable.)

Best I can suggest is studying Python's bsddb3 substantial test suite.  ZODB
has modules to build ZODB's transaction model on top of a Berkeley database,
but I don't think I'd call that simple.  I'm not a bsddb guy, though, so
those are just random things I've seen.




More information about the spambayes-dev mailing list