[spambayes-dev] A new and altogether different bsddb breakage
Richie Hindle
richie at entrian.com
Thu Dec 11 12:38:33 EST 2003
In response to Skip's question about hapax ratios, I ran his script and
received an error. I boiled the problem down to this:
>>> print [db[k] for k in db]
Traceback (most recent call last):
File "hapaxes.py", line 3, in ?
print [db[k] for k in db]
File "C:\Python23\lib\shelve.py", line 118, in __getitem__
f = StringIO(self.dict[key])
File "C:\Python23\lib\bsddb\__init__.py", line 86, in __getitem__
return self.db[key]
KeyError: 'pics'
Excuse me? Er, so how many of these things are there?
>>> len([k for k in db if db.get(k, None) is None])
306
And what do they look like?
>>> from pprint import pprint as p
>>> p([k for i, k in enumerate(db) if db.get(k, None) is None and i % 50 == 0])
['magnetism',
'url:mlqnuvs',
'from:addr:wi872u',
'autograph.',
'url:ff-programs',
'motels,']
So they have nothing obvious in common. Looking through the full list
it's obvious that they don't all come from one message. Some are
obviously ham clues and some are obviously spam.
I'm probably winging my way towards a DBRunRecovery error, unless someone
can explain what's going on?
--
Richie Hindle
richie at entrian.com
More information about the spambayes-dev
mailing list