[spambayes-dev] A new and altogether different bsddb breakage

Richie Hindle richie at entrian.com
Mon Dec 15 04:00:19 EST 2003


[Richie]
> >>> print [db[k] for k in db]
> KeyError: 'pics'

[Tim]
> Ouch.  What do you get if you open the database directly, instead of
> indirecting thru a shelf?  I'm just trying to make sure it's really the
> database that's hosed.

I think we're using different versions of bsddb - your code fails for me:

>>> d = bsddb.hashopen("/src/tests/spambayes/hammie.db")
>>> len(d)
52331
>>> len([k for k in d if d.get(k, None) is None])
Traceback (most recent call last):
  File "<pyshell#4>", line 1, in -toplevel-
    len([k for k in d if d.get(k, None) is None])
  File "C:\Python23\lib\bsddb\__init__.py", line 86, in __getitem__
    return self.db[key]
TypeError: Integer keys only allowed for Recno and Queue DB's

I think this is because GET_ITER is creating a list-style iterator rather
than a dict-style one.  bsddb objects don't look much like dictionaries:

>>> len([k for k in d.keys() if d.get(k, None) is None])
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in -toplevel-
    len([k for k in d.keys() if d.get(k, None) is None])
AttributeError: _DBWithCursor instance has no attribute 'get'

I have Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)]
on win32.  Assuming that's a red herring, here's an equivalent that works
for me:

>>> def get(d, k, default):
        try:
                return d[k]
        except KeyError:
                return default

>>> len([k for k in d.keys() if get(d, k, None) is None])
305

So yes, the underlying database is screwed.  But one token less screwed
than last time - lovely.  (I now get 305 when going through shelve as
well.)  I've done some training in between, which must have jiggled things
around.

[Tim]
> Gotta say, I'm half ready to declare
> that ZODB is the only database anyone should ever use (the bugs in that are
> long fixed <wink>).

I'm certainly underwhelmed by bsddb in single-file mode.  One day I want
to make spambayes use full transaction mode - that really ought to work.
(Does anyone know of any simple Python code I can steal that uses bsddb in
full-on multi-everything DBEnv mode?  The pybsddb docs just link to the
SleepyCat C API docs, which aren't very approachable.)

-- 
Richie Hindle
richie at entrian.com




More information about the spambayes-dev mailing list