[Python-bugs-list] [ python-Bugs-491888 ] whichdb lies about db type

noreply@sourceforge.net noreply@sourceforge.net
Mon, 12 Aug 2002 14:06:40 -0700


Bugs item #491888, was opened at 2001-12-11 21:22
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=491888&group_id=5470

Category: Python Library
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Richard Jones (richard)
Assigned to: Martin v. Löwis (loewis)
Summary: whichdb lies about db type

Initial Comment:
>>> import dbm
>>> d = dbm.open('foo', 'n')
>>> d['a'] = 'b'
>>> d.close()
>>> import whichdb
>>> whichdb.whichdb('foo.db')
'dbhash'

I'm currently testing for the existence of "foo.db" 
instead of "foo" and hard-code my routines to use dbm 
if there is a "foo.db" file (since all other db 
modules that I've tested do no append ".db")

Might it also be possible to have anydbm perform a 
whichdb check in its open function, so that older 
databases are usable with newer, more feature-full 
installations that might include "better" dbm 
backends?



----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-08-12 16:06

Message:
Logged In: YES 
user_id=44345

If the user opened the file with

    db = anydbm.open("foo", "c")

*and* the dbm module happened to be selected by anydbm *and* 
dbmmodule.so happened to be linked with BerkDB, the file created will 
be named "foo.db" and will actually be a BerkDB hash file (whose 
version depends on the version of the library installed).  If the user later 
asks whichdb.whichdb what type of file "foo"  is, my latest change 
corrected responds "dbm".  If, on the other hand, the user asks 
whichdb.whichdb what type of file "foo.db" is, it should now respond 
"dbhash".   This is what my recent patch to the whichdb module fixed.  
It would be incorrect to try to open "foo.db" with the dbm module.

If a bsddb.error exception is raised, it's almost certainly because the user 
upgraded the BerkDB library, but didn't run the tools provided by 
Sleepycat to upgrade his or her preexisting files.  I don't see how there's 
a Python problem here that needs solving.  It's simply pilot error.  The 
best we can do I think is improve the message associated with the 
exception which the module raises.  (Something like "invalid file format" 
instead of simply "invalid argument.)

In my previous note I made a mistake.  Instead of

    He should have called
        dbhash.open('foo', 'r')
    as he later demonstrated.

The function call should have been "dbm.open".


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-08-11 16:13

Message:
Logged In: YES 
user_id=21627

Skip, I think you misunderstand the complaint. It's not
about the way in which an error message is given, but that
the error message is given at all.

The file is a dbm file, and the dbm module is capable of
opening it, so no error should be reported at all.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-25 10:03

Message:
Logged In: YES 
user_id=44345

Martin's comment in bug 584409 reminded me that I have a patched 
whichdb module which should cure this problem.  (At the moment my 
dbm module is linked with gdbm, not BerkDB, however, so while I've 
tested this in the past, I can't provide you with an interactive 
demonstration at the moment.)  Note that Richard was forced to do 
something for which whichdb was not designed. I believe with this 
patch he should be able to once again ask for simply "foo" and not 
wonder what extensions the underlying db package add to the files.

I still don't think version information would help here.  Richard's tests 
are flawed.  Berkeley DB only adds ".db" to the end of the file when 
using the dbm-compatibility API.  He should have called

    dbhash.open('foo', 'r')

as he later demonstrated.  While somewhat mystifying, the bsddb.error 
is more or less correct.  We should probably trap that and raise a "file 
not found" error or just try a stat() call if the db file is to be opened for 
reading.

Assigning to Martin for consideration.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-13 03:43

Message:
Logged In: YES 
user_id=21627

I see. The problem appears to be that your BSDDB
installation, which implements hash version 7, does not
simultaneously support hash version 5 anymore. This
primarily is a problem in the Sleepycat version shipped with
your system (for not supporting old databases), and in glibc
(for not incorporating a newer bsd db). Python can work
around this problem, at best - there might always be DBHASH
files that none of the DB implementations on a system can open.

bsddb should expose version information, like DB_HASHVERSION
and DB_HASHOLDVER (the current and the minimum hash
version). Unfortunately, db_185.h, as used by bsddb.c, do
not provide these constants, and db_185.h cannot be used
simultaneously with db.h. db_185.h exposes a HASHVERSION
constant, but that seems to stay at 2 regardless of the file
version that the compatibility API uses.

The right solution seems to drop support for the DB1 API,
and mandate a DB2-or-better db.h. I'd personally recommend
to integrate pybsddb.sf.net into Python 2.3, adding
portability to BSDDB 2 if necessary (it could be a
build-time decision to build either source module as bsddb).

For the moment, I cannot recommend a good work-around; I see
two options:
- find out magically (by looking at db.h) what hash versions
dbhash will support, then check the version of the hash
file, and refuse to use dbash if the version won't be
supported. Since this requires magic, such code should not
be added to Python, but left to the application.
- catch bsddb.error on dbhash.open, and retry with dbm.open.
 This is a heuristic which also shouldn't be added to
Python, but which may be acceptable to the application.

----------------------------------------------------------------------

Comment By: Richard Jones (richard)
Date: 2001-12-13 01:05

Message:
Logged In: YES 
user_id=6405

Sorry about the anydbm/whichdb confusion - reading the 
source a little closer would have avoided my confusion.

Regardless, there is still a problem that on my system, 
dbm files are reported as dbhash, and dbhash can't open 
the dbm files...

[richard@co3044991-a tmp]$ python
Python 2.1.1 (#1, Aug 30 2001, 17:36:05) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.61mdk)] on 
linux-i386
Type "copyright", "credits" or "license" for more 
information.
>>> import dbm
>>> dbm.open('foo','n')
<dbm object at 0x812a0f0>
>>> import dbhash
>>> dbhash.open('bar', 'n')
<bsddb object at 0x812b870>
>>> 
>>> import whichdb
>>> whichdb.whichdb('foo.db')
'dbhash'
>>> whichdb.whichdb('bar')
'dbhash'
>>> dbhash.open('foo.db', 'r')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.1/dbhash.py", line 16, in open
    return bsddb.hashopen(file, flag, mode)
bsddb.error: (-30990, 'Unknown error 4294936306')
>>> dbhash.open('bar', 'r')
<bsddb object at 0x812ef48>
>>> 
[richard@co3044991-a tmp]$ file foo.db
foo.db: Berkeley DB (Hash, version 5, native byte-order)
[richard@co3044991-a tmp]$ file bar
bar: Berkeley DB (Hash, version 7, native byte-order)



----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-12 17:28

Message:
Logged In: YES 
user_id=21627

I fail to see the problem altogether. What system are you
on? Why do you think dbm does not create dbhash files? It is
not just that the magic says they are BSDDB DB_HASH files,
they really are of that kind?

Also, which of the APIs (dbm, dbhash) do you consider
"better"? I'd say that dbhash is better, since it builds
upon bsddb. So whichdbm, and anydbm, do use the "better" dbm
backend already?

Where is the bug?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-11 22:28

Message:
Logged In: YES 
user_id=6380

Hm. anydmb *does* use whichdb. The problem seems to be that
the dbm file really *does* look like a BSD hash -- the Unix
file(1) command has the same problem.

But I'm not sure I understand your question. Do you have a
particular patch in mind?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=491888&group_id=5470