[Spambayes] The database question that would not die
Skip Montanaro
skip at pobox.com
Tue Dec 3 18:10:28 2002
richie> Are there any platforms on which, when you ask anydbm to create
richie> a database, it uses version 1.85 of the underlying Berkeley DB
richie> library to do that?
Yes, unfortunately the Python Windows installer is distributed with Berkeley
DB 1.85. On other platforms it's a hit-or-miss proposition. I don't
believe any Linux vendors ship with db1 as the default anymore, but I could
easily be disabused of that notion. I don't know about the commercial Unix
vendors.
Has anyone considered Sleepycat's caveats about using 1.85? The relevant
page is here:
http://www.sleepycat.com/historic.html
The q/a about 1.85 is:
Are there known problems with the 1.85 and 1.86 versions?
Yes. Specifically, we recommend that you avoid the following operations
when using versions 1.85 and 1.86:
* Btree cursor (seq and put using a cursor) operations.
* Large numbers of btree duplicates (specifically, avoid migrating
duplicate keys to internal pages).
* Large numbers of btree deletes (you should periodically dump and
rebuild the database if you delete large numbers of records).
* Overwriting or deleting overflow hash key/data pairs (pairs with items
larger than the page size).
* Intermixing hash cursor operations with deletes.
In addition:
* As there was no locking support in version 1.85, you cannot perform
concurrent read/write operations in the database.
* As there was no logging or transaction support in version 1.85, you
must re-create your database whenever abnormal application termination
occurs (e.g., either the application or the system crashes) as the
database may have been left in a corrupted state.
Finally, you should not upgrade your GNU gcc or Solaris
compiler. Optimizations in versions of gcc 2 that were in alpha test in
the summer of 1997, and a version of the standard Solaris WorkShop
Compiler that was in beta test in the fall of 1997, trigger bugs in
versions 1.85 and 1.86 that will cause sporadic core dumps.
It seems to me the most important issues for us are the last two bullets in
the first section and the last bullet in the second section. How close can
we come to avoiding them? I don't think we should have any overflow has
key/data pairs. The largest item in my current hammie.db file is only 108
bytes. Does the code do things like
foo = db.next()
if someprop(foo):
del db[foo[0]]
? If not that may not be a problem either. The "abnormal termination" bit
bothers me some, based on historical prejudices about Windows'
(in)stability. I imagine others can speak to that.
Skip
More information about the Spambayes
mailing list