What do people think about including bsddb3 in Python 2.3, along with deprecating the existing bsddb module? You'll find the package at http://pybsddb.sourceforge.net/ It would come as a bsddb3 package, which acts interface-compatible with the current bsddb module. Various submodules give access to more advanced features. The main rationale for dropping bsddb is that it still relies on the db_185.h interface, which will be phased out sooner or later. Existance of this interface, in turn, results in problems with anydbm: There are multiple versions of the database files available in the world, and any BSDDB installation can only handle so many of these versions. Now, on Linux, it is common that bsddb3 is installed, but that glibc offers bsddb2 simultaneously. For anydbm to analyse this situation properly, it would need some of the more advanced bsddb facilities. While this is the rationale for dropping the existing bsddb module sooner or later, there are, of course, numerous advantages in exposing the additional BSDDB features, like concurrency, transactions, and cursors. Any opinions? Regards, Martin
What do people think about including bsddb3 in Python 2.3, along with deprecating the existing bsddb module? You'll find the package at
http://pybsddb.sourceforge.net/
It would come as a bsddb3 package, which acts interface-compatible with the current bsddb module. Various submodules give access to more advanced features.
The main rationale for dropping bsddb is that it still relies on the db_185.h interface, which will be phased out sooner or later. Existance of this interface, in turn, results in problems with anydbm:
There are multiple versions of the database files available in the world, and any BSDDB installation can only handle so many of these versions. Now, on Linux, it is common that bsddb3 is installed, but that glibc offers bsddb2 simultaneously. For anydbm to analyse this situation properly, it would need some of the more advanced bsddb facilities.
While this is the rationale for dropping the existing bsddb module sooner or later, there are, of course, numerous advantages in exposing the additional BSDDB features, like concurrency, transactions, and cursors.
Any opinions?
Sounds like a good plan, but we should make sure it can all be re-released under the PSF license. For the Zope Corp. portions of the code I promise that's no problem :-) -- but there are so many other contributors that it's getting a little tangled... --Guido van Rossum (home page: http://www.python.org/~guido/)
"MvL" == Martin v Loewis
writes:
MvL> What do people think about including bsddb3 in Python 2.3, MvL> along with deprecating the existing bsddb module? You'll find MvL> the package at MvL> http://pybsddb.sourceforge.net/ +1, for several reasons. - Robin's done a great job with the module. It feels quite solid and reliable. I've used it quite a bit working on Berkeley storage for ZODB/Zope. - Berkeley support in 2.2 is broken -- at least the setup.py rules are. On my stock, but stocked Mandrake 8.1 system, bsddbmodule never links right and the standard setup.py always deletes it because oflink problems. Fixing this is on My List, although I'd prefer to work with pybsddb. - I've talked to the Sleepycat guys, and if we wanted to, we could provide the Berkeley libraries with our distros with no licensing problems. Using Berkeley through the pybsddb binding is perfectly legal for any programs using them through Python. - It'd be great if we actually provided bsddb1, bsddb2, bsddb3 (and bsddb4?) modules which compile against the older libraries so databases written with any version could be accessed in Python. Maybe that's not exactly the right way to do it, but I don't think Python should be limited to just one version of Berkeley db. I've no idea what the default ought to be -- there's no clear winner. MvL> It would come as a bsddb3 package, which acts MvL> interface-compatible with the current bsddb module. Various MvL> submodules give access to more advanced features. I often "import bsddb3 as bsddb". MvL> The main rationale for dropping bsddb is that it still relies MvL> on the db_185.h interface, which will be phased out sooner or MvL> later. Existance of this interface, in turn, results in MvL> problems with anydbm: As mentioned above, I can see reasons for wanting to access any version of Berkeley db. -Barry
"GvR" == Guido van Rossum
writes:
GvR> Sounds like a good plan, but we should make sure it can all GvR> be re-released under the PSF license. For the Zope GvR> Corp. portions of the code I promise that's no problem :-) -- GvR> but there are so many other contributors that it's getting a GvR> little tangled... I /think/ we're just talking mostly about Robin Dunn and Andrew Kuchling. From the description on the page, I can't quite tell whether any of Gregory P. Smith's original code remains. i'm-sure-andrew-won't-mind-either-ly y'rs, -Barry
- It'd be great if we actually provided bsddb1, bsddb2, bsddb3 (and bsddb4?) modules which compile against the older libraries so databases written with any version could be accessed in Python. Maybe that's not exactly the right way to do it, but I don't think Python should be limited to just one version of Berkeley db. I've no idea what the default ought to be -- there's no clear winner.
I'm not sure how that would work, though. Are you thinking of different code bases for the modules, or just compiling the same module multiple times? If the latter, how do you deal with features that are available only in later versions? E.g. I doubt that the current _db.c compiles with bsddb2 (not sure it even compiles with 3.0; it may be that 3.1 is required as a minimum). This *could* be solved with lots of #ifdefs in _db.c, but that sounds difficult to get right (who has so many versions installed to actually test that?). Also, I think it is rare that multiple versions are installed on a single system: I doubt BSDDB even supports simultaneous installation of multiple header file sets, on Unix. So even while you can have multiple versions of the shared library installed, compiling it for use with these libraries may be tricky. About the only case where I know about different systems is on Linux, where glibc incorporates a version of BSDDB2, so you might find database file of that version that the more recent BSDDB3 cannot open, anymore. For any other scenario, users are to blame for forgetting to update their database files when updating the libraries. Regards, Martin
>> - It'd be great if we actually provided bsddb1, bsddb2, bsddb3 (and >> bsddb4?) modules which compile against the older libraries so >> databases written with any version could be accessed in Python. Martin> I'm not sure how that would work, though. Agreed. I think trying to use multiple versions of libdb-generated files simultaneously is a disaster waiting to happen. It's unfortunate that the folks at Sleepycat haven't been able to provide a more consistent data format, but I understand that stuff is internal details and can change. They have been pretty good about providing update tools. What would be useful is if whatever bsddb module is installed could be more intelligent about file version errors. Instead of reporting something inscrutable like >>> db = bsddb.hashopen("tour.db") Traceback (most recent call last): File "<stdin>", line 1, in ? bsddb.error: (-30990, 'Unknown error 4294936306') I'd like it to realize that it was asked to open an old format file and give a useful error message like: bsddb.error: (-30990, 'Attempt to open old format file - see db_upgrade(1)') Sleepycat's tools can do this in the face of old files: % file tour.db tour.db: Berkeley DB (Hash, version 5, native byte-order) % db_dump tour.db > tour.txt db_dump: tour.db: hash version 5 requires a version upgrade db_dump: open: tour.db: DB_OLDVERSION: Database requires a version upgrade % db_upgrade tour.db % file tour.db tour.db: Berkeley DB (Hash, version 7, native byte-order) % db_dump tour.db > tour.txt Martin> Also, I think it is rare that multiple versions are installed on Martin> a single system: I doubt BSDDB even supports simultaneous Martin> installation of multiple header file sets, on Unix. Actually, RedHat & Mandrake do. This leads to as many problems as it solves. Take a look at the code in setup.py: dblib = [] if self.compiler.find_library_file(lib_dirs, 'db-3.2'): dblib = ['db-3.2'] elif self.compiler.find_library_file(lib_dirs, 'db-3.1'): dblib = ['db-3.1'] elif self.compiler.find_library_file(lib_dirs, 'db3'): dblib = ['db3'] elif self.compiler.find_library_file(lib_dirs, 'db2'): dblib = ['db2'] elif self.compiler.find_library_file(lib_dirs, 'db1'): dblib = ['db1'] elif self.compiler.find_library_file(lib_dirs, 'db'): dblib = ['db'] db185_incs = find_file('db_185.h', inc_dirs, ['/usr/include/db3', '/usr/include/db2']) db_inc = find_file('db.h', inc_dirs, ['/usr/include/db1']) And it's still not correct, as Barry indicated yesterday. For example, suppose that even though db3 is installed on your system you want to only manipulate db2 databases (perhaps for compatibility with another machine). You're stuck and have to edit setup.py or use Modules/Setup to build bsddb. Martin> So even while you can have multiple versions of the shared Martin> library installed, compiling it for use with these libraries may Martin> be tricky. Got that right... ;-) Martin> For any other scenario, users are to blame for forgetting to Martin> update their database files when updating the libraries. In the presence of anydbm, it's not obvious that users should know what file format their underlying databases are. Skip
"SM" == Skip Montanaro
writes:
>> - It'd be great if we actually provided bsddb1, bsddb2, bsddb3 >> (and bsddb4?) modules which compile against the older libraries >> so databases written with any version could be accessed in >> Python. Martin> I'm not sure how that would work, though. SM> Agreed. Oops. I thought I had read that pybsddb could be compiled against older APIs. But on a re-read of the pages, that's obviously wrong, so forget this dumb idea. SM> What would be useful is if whatever bsddb module is installed SM> could be more intelligent about file version errors. +1 Martin> Also, I think it is rare that multiple versions are Martin> installed on a single system: I doubt BSDDB even supports Martin> simultaneous installation of multiple header file sets, on Martin> Unix. SM> Actually, RedHat & Mandrake do. This leads to as many SM> problems as it solves. Indeed, this is broken on Mandrake. I was trying to get Postfix and Python to at least agree on the BDB version they were going to use and it wasn't until I installed pybsddb from source, and rebuilt Postfix against the separately downloaded Berkeley 3.3.11 libs/API that I got it all to work. SM> Take a look at the code in setup.py: BTW, I think this a large part of the problem when building Py2.2 on Mandrake 8.1. Maybe these lines in the setup are /too/ smart? I seem to remember having no problems w/ Py2.1.1. But that's excusable I suppose since pybsddb's setup.py has its own problems! It should at least recognize a default from-source install of Sleepycat's libs w/o lots of cryptic command line options. And getting "python setup.py clean -a" to work right would be a bonus. :) Also note that pybsddb should now (or soon) work with Berkeley DB 4 so calling it bsddb3 isn't right either. I don't think there's a db format change from BDB 3 -> BDB 4. bsddb-ng? :) Okay, I'm rambling. Let's add pybsddb (under a better name) and keep bsddbmodule around and /try/ to fix some of the worst installation problems. The state of Berkeley DB on various distros doesn't make our lives easy here, but let's not add to the problems, if at all possible. I'm willing to help out with all this. We should also get buy-in from Robin since we also don't want to fork develoment or have to keep the two in sync. -Barry
On Mon, Jan 07, 2002 at 11:10:00PM -0500, Barry A. Warsaw wrote:
"GvR" == Guido van Rossum
writes: GvR> Sounds like a good plan, but we should make sure it can all GvR> be re-released under the PSF license. For the Zope GvR> Corp. portions of the code I promise that's no problem :-) -- GvR> but there are so many other contributors that it's getting a GvR> little tangled...
I /think/ we're just talking mostly about Robin Dunn and Andrew Kuchling. From the description on the page, I can't quite tell whether any of Gregory P. Smith's original code remains.
i'm-sure-andrew-won't-mind-either-ly y'rs, -Barry
Consider any of my pybsddb/bsddb3 code that remains [some does i'm sure] placed under whatever open source license is needed, (PSF license, etc). (I prefer the code to be used, not bickered about :). -g -- Gregory P. Smith
participants (5)
-
barry@zope.com
-
Gregory P. Smith
-
Guido van Rossum
-
Martin v. Loewis
-
Skip Montanaro