[spambayes-dev] A new and altogether different bsddb breakage

Richie Hindle richie at entrian.com
Thu Dec 11 12:38:33 EST 2003


In response to Skip's question about hapax ratios, I ran his script and
received an error.  I boiled the problem down to this:

>>> print [db[k] for k in db]
Traceback (most recent call last):
  File "hapaxes.py", line 3, in ?
    print [db[k] for k in db]
  File "C:\Python23\lib\shelve.py", line 118, in __getitem__
    f = StringIO(self.dict[key])
  File "C:\Python23\lib\bsddb\__init__.py", line 86, in __getitem__
    return self.db[key]
KeyError: 'pics'

Excuse me?  Er, so how many of these things are there?

>>> len([k for k in db if db.get(k, None) is None])
306

And what do they look like?

>>> from pprint import pprint as p
>>> p([k for i, k in enumerate(db) if db.get(k, None) is None and i % 50 == 0])
['magnetism',
 'url:mlqnuvs',
 'from:addr:wi872u',
 'autograph.',
 'url:ff-programs',
 'motels,']

So they have nothing obvious in common.  Looking through the full list
it's obvious that they don't all come from one message.  Some are
obviously ham clues and some are obviously spam.

I'm probably winging my way towards a DBRunRecovery error, unless someone
can explain what's going on?

-- 
Richie Hindle
richie at entrian.com




More information about the spambayes-dev mailing list