[Spambayes] Shelve database corruption?
skip at pobox.com
Wed May 19 14:40:28 EDT 2004
Lars> I'm using Python 2.2.2; could this be a compatibility thing?
Sure could be. I wrote the compatcsv.py code. I don't know how much
testing it's gotten beyond what I gave it. My main goal was to allow Python
2.2 users to use sb_dbexpimp.py after switching the storage format to csv.
There were deficiencies in the encoding scheme sb_dbexpimp.py used
Lars> As far as I can tell, in Python 2.2.2 file objects don't have a
Lars> next() method.
That may well be an issue. Most/all of my testing was done by forcing use
of compatcsv.py even when csv was available. I typically use Python from
CVS (aka 2.4a0 at the moment), so I would not have had 2.2-style file
Lars> Changing that made it run a bit further, but I still got problems:
Lars> [root at pavarotti spambayes-1.0rc1]# sb_dbexpimp.py -i -d hammie.db -f hammie.db.export
Lars> parse error: 2
Lars> Traceback (most recent call last):
Lars> File "/usr/bin/sb_dbexpimp.py", line 266, in ?
Lars> runImport(dbFN, useDBM, newDBM, flatFN)
Lars> File "/usr/bin/sb_dbexpimp.py", line 183, in runImport
Lars> (nham, nspam) = rdr.next()
Lars> File "/usr/lib/python2.2/site-packages/spambayes/compatcsv.py", line 23, in next
Lars> return self.parse_line(self.fp.readline())
Lars> File "/usr/bin/sb_dbexpimp.py", line 170, in runImport
Lars> OSError: [Errno 2] No such file or directory: 'hammie.db.dir'
That's probably a holdover from some previous incarnation where .dir files
were getting created (possibly in an environment with dbm or gdbm instead of
bsddb as the usual database file format).
Lars> Removing OSError so it catches all exceptions has no
Lars> effect. Removing the whole try/except plus the unlink call makes
Lars> the block above deleting the .dat file throw an OSError!?!
The .dir/.dat thing really smells like dbm files to me.
Lars> Apparently it's unhappy with hammie.db.export, so apparently
Lars> sb_dbexpimp.py produces CSV files that it can't read back in
Lars> again. I discovered that line 66 of compatcsv.py has a bug that means
Lars> it's never worked:
Lars> line = line[len(field)+len(match.group(2))]
Lars> should be:
Lars> line = line[len(field)+len(match.group(2)):]
I'll take a look at this. Can you submit a bug report on SF and assign it
to me (montanaro)?
Lars> Then I got into trouble with compatcsv.py assuming the file was
Lars> UTF-8, and I haven't been able to fiddle more with it. It does
Lars> look like this code doesn't run on Python 2.2 at all. I'll have to
Lars> consider installing 2.3 or spending more time on fixing it.
Assuming you have a C compiler you should be able to just snatch the csv.py
and _csv.c files from a 2.3 installation and compile them against your
Python 2.2 build environment. The goal all along was that the code in
_csv.c should work with 2.2.
More information about the Spambayes