[ python-Bugs-857909 ] bsddb craps out sporadically

Thu Dec 8 10:33:15 CET 2005

Bugs item #857909, was opened at 2003-12-10 14:41
Message generated for change (Comment added) made by brandonh
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857909&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Python 2.3
Status: Closed
Resolution: Wont Fix
Priority: 5
Submitted By: Predrag Miocinovic (predragm)
Assigned to: Gregory P. Smith (greg)
Summary: bsddb craps out sporadically

Initial Comment:
I get following from Python2.3.2 with BerkeleyDB 3.3.11
running on linux RH7.3;
------------------------
Traceback (most recent call last):
  File &quot;/raid/ANITA-lite/gse/unpackd.py&quot;, line 702, in ?
    PacketObject.shelve()
  File &quot;/raid/ANITA-lite/gse/unpackd.py&quot;, line 78, in
shelve
    wvShelf[shelfKey] = self
  File &quot;/usr/local/lib/python2.3/shelve.py&quot;, line 130,
in __setitem__
    self.dict[key] = f.getvalue()
  File &quot;/usr/local/lib/python2.3/bsddb/__init__.py&quot;,
line 120, in __setitem__
    self.db[key] = value
bsddb._db.DBRunRecoveryError: (-30987, 'DB_RUNRECOVERY:
Fatal error, run database recovery -- PANIC: Invalid
argument')
Exception bsddb._db.DBRunRecoveryError: (-30987,
'DB_RUNRECOVERY: Fatal error, run database recovery')
in  ignored
Exception bsddb._db.DBRunRecoveryError: (-30987,
'DB_RUNRECOVERY: Fatal error, run database recovery')
in  ignored
----------------------------------
The server reporting this is running at relatively
heavy load and the error occurs several times per day
(this call occurs roughly 100,000  per day, but only 42
times per any given shelve instance). It  reminds be of
bug report #775414, but this is a non-threaded
application. 
That said, another process is accessing the same
shelve, but I've implemented a lockout system which
should make sure they don't have simultaneous access.
The lockout seems to work fine. 
The same application is running on different machine using 
Python2.3.2 with BerkeleyDB 4.0.14 on linux RH9 and the
same error occured once (to my knowledge), but with
&quot;30987&quot; replaced by &quot;30981&quot; in the traceback above, if
it makes any difference. 
Finally, a third system, python2.3.2 with BerkeleyDB
4.0.14 on linux RH9 (but quite a bit faster, and thus
lighter load) runs w/o reporting this problem so far. 

I don't have a convenient code snipet to exemplify the
problem, but I don't do anything more than open (or
re-open) a shelve and write a single python object
instance to it per opening. If necessary, I can provide
the code in question. 

----------------------------------------------------------------------

Comment By: Brandon Hechinger (brandonh)
Date: 2005-12-08 01:33

Message:
Logged In: YES 
user_id=226421

We also get this error, though not using Python, but C.  I'm
not sure why people are so eager to dismiss it as an issue
here, however, for it might be something your Python is
doing with the Berkeley DB interface which could be improved.

In our case, there is a similarity -- the site accesses the
database(s) at relatively high frequency, and we use our own
locking system to prevent any conflict (allowing multiple
readers and exclusive writers -- writers not so much as
generating a path to the database, let alone opening it,
until they obtain the separate lock handled by our software).

Periodically one of the databases will have an error when
reading a key, and this error will remain until the database
is repaired.  The error return code is -30987.

It's not 100% conclusive if it happens primarily on
frequently accessed databases or not, and were it the case,
it is not clear whether that's just because it occurs
because of the high volume of access, or just because their
volume increases the likelihood of encountering an error. 
In any case, our locking mechanisms (we've tried more than
one) do lock prior to the database being opened at all, and
are handled in a multi-reader single-writer way.

Again, it's not clear if it's a Berkeley DB problem, or a
problem with the *way* we are accessing/using Berkeley DB. 
Until this is known, I don't think it should be so quickly
blown off that it's not a Python issue -- even if a bit of
resources of the Python resources went into finding a
Berkeley DB problem, would it result in such a bad world?  :)

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2004-06-16 15:50

Message:
Logged In: YES 
user_id=413

DB_RUNRECOVERY errors are a sleepycat BerkeleyDB internal
error and don't have anything to do with the python library
wrapper.  For help in tracking them down I suggest using the
latest BerkeleyDB version and ask with example code on the
berkeleydb newsgroups or ask sleepycat themselves (they
don't bite, they're friendly).

closing this bug as its not a python or extension module bug.

If you're looking for a multiprocess aware BerkeleyDB shelve
support, that should be a feature request (ideally with an
example implementation :).

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2003-12-22 16:04

Message:
Logged In: YES 
user_id=250749

I can sympathise with your POV, but shelve has a 
documented restriction that it is not supported for multiuser 
user use without specific external support - that is multiple 
readers are Ok, but writing requires exclusive access to the 
shelve database.

As you are using it in such an environment, it is up to you to 
guarantee the required safety.  The error being reported is 
highly likely to be a consequence of your locking scheme 
being inadequate for use with the BerkeleyDB environment, at 
least on that system, and my suggestion that you take this 
up in a BerkeleyDB forum was directed at you getting 
sufficient information to improve your locking scheme to avoid 
the problem you see.

I think you are a little optimistic expecting the shelve module 
(let alone the anydbm module) to cope with exceptions arising 
from use outside its documented restrictions - and BerkeleyDB 
supports lots of capability beyond the scope of the 
functionality used by shelve and anydbm and the exceptions 
to go with that.

If you care about the shelve storage format, you can force 
the type of storage by creating an empty database of the 
format of your choice, provided that that format is supported 
by anydbm.  With a bit of care, you should be able to convert 
a shelve from one format to another, within the anydbm 
format support restriction.

While it might be nice to have some format control, anydbm's 
purpose is hide the database format/interface. If you care 
about the format, you're expected to use the desired 
interface module directly.

----------------------------------------------------------------------

Comment By: Predrag Miocinovic (predragm)
Date: 2003-12-21 20:48

Message:
Logged In: YES 
user_id=860222

I find the last comment somewhat unsatisfactory. While this
appears to be BerkeleyDB issue (and w/o going into details
why the exception gets thrown), it's strange that Shelve
module doesn't handle this more gracefully. Since the
concept of Shelve is hiding implementation details from the
application, having to catch BerkeleyDB exceptions for
simple shelf operations is bit over the top. If I move to
another system, using different underlying DB (as given by
anydbm), will I have to catch new set of exceptions all over
again? 
Shelve (or anydbm) should either provide ability to select
underlying DB implementation to use, or it should handle all
DB implementation aspects so that it is trully transparent
to the end user. 
Just my $0.02.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2003-12-21 03:50

Message:
Logged In: YES 
user_id=250749

As far as I can make out, what you're seeing is a BerkeleyDB
issue, and bsddb is just reporting what BDB is telling it.

DB_RUNRECOVERY (-30987 on DB 3.3, -30981 on DB 4.0) is
documented as (quoted from DB4.0 HTML docs):
&quot;There exists a class of errors that Berkeley DB considers
fatal to an entire Berkeley DB environment. An example of
this type of error is a corrupted database or a log write
failure because the disk is out of free space. The only way
to recover from these failures is to have all threads of
control exit the Berkeley DB environment, run recovery of
the environment, and re-enter Berkeley DB.&quot;

Therefore I think you should to followup this in a
BerkeleyDB forum.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857909&group_id=5470