[Spambayes] Shelve database corruption?

Skip Montanaro skip at pobox.com
Wed May 19 14:40:28 EDT 2004


    Lars> I'm using Python 2.2.2; could this be a compatibility thing?

Sure could be.  I wrote the compatcsv.py code.  I don't know how much
testing it's gotten beyond what I gave it.  My main goal was to allow Python
2.2 users to use sb_dbexpimp.py after switching the storage format to csv.
There were deficiencies in the encoding scheme sb_dbexpimp.py used
before.

    Lars> As far as I can tell, in Python 2.2.2 file objects don't have a
    Lars> next() method. 

That may well be an issue.  Most/all of my testing was done by forcing use
of compatcsv.py even when csv was available.  I typically use Python from
CVS (aka 2.4a0 at the moment), so I would not have had 2.2-style file
objects.

    Lars> Changing that made it run a bit further, but I still got problems:

    Lars> [root at pavarotti spambayes-1.0rc1]# sb_dbexpimp.py -i -d hammie.db -f hammie.db.export
    Lars> parse error: 2
    Lars> Traceback (most recent call last):
    Lars>   File "/usr/bin/sb_dbexpimp.py", line 266, in ?
    Lars>     runImport(dbFN, useDBM, newDBM, flatFN)
    Lars>   File "/usr/bin/sb_dbexpimp.py", line 183, in runImport
    Lars>     (nham, nspam) = rdr.next()
    Lars>   File "/usr/lib/python2.2/site-packages/spambayes/compatcsv.py", line 23, in next
    Lars>     return self.parse_line(self.fp.readline())
    Lars>   File "/usr/bin/sb_dbexpimp.py", line 170, in runImport
    Lars>     os.unlink(dbFN+".dir")
    Lars> OSError: [Errno 2] No such file or directory: 'hammie.db.dir'

That's probably a holdover from some previous incarnation where .dir files
were getting created (possibly in an environment with dbm or gdbm instead of
bsddb as the usual database file format).

    Lars> Removing OSError so it catches all exceptions has no
    Lars> effect. Removing the whole try/except plus the unlink call makes
    Lars> the block above deleting the .dat file throw an OSError!?!

The .dir/.dat thing really smells like dbm files to me.

    Lars> Apparently it's unhappy with hammie.db.export, so apparently
    Lars> sb_dbexpimp.py produces CSV files that it can't read back in
    Lars> again. I discovered that line 66 of compatcsv.py has a bug that means
    Lars> it's never worked:

    Lars>                 line = line[len(field)+len(match.group(2))]

    Lars> should be:

    Lars>                 line = line[len(field)+len(match.group(2)):]

I'll take a look at this.  Can you submit a bug report on SF and assign it
to me (montanaro)?

    Lars> Then I got into trouble with compatcsv.py assuming the file was
    Lars> UTF-8, and I haven't been able to fiddle more with it. It does
    Lars> look like this code doesn't run on Python 2.2 at all. I'll have to
    Lars> consider installing 2.3 or spending more time on fixing it.

Assuming you have a C compiler you should be able to just snatch the csv.py
and _csv.c files from a 2.3 installation and compile them against your
Python 2.2 build environment.  The goal all along was that the code in
_csv.c should work with 2.2.

Skip



More information about the Spambayes mailing list