[spambayes-dev] Strange performancedipandDBRunRecoveryErrorretreat

Mon Jan 5 09:36:40 EST 2004

Tony Meyer wrote:
> [Tony]
>> Maybe this is one of the causes for the RUNRECOVERY errors - the user
>> doesn't close sb_server properly, so the db isn't properly
>> closed/saved.
> 
> [Tim]
>> I think you may be on to something here!
> 
> [Richie]
>> Sadly not.  sb_server saves the db after ever train as well,
>> out of paranoia.
> 
> [snip]
> 
> Looks like it's back to some other cause of the RunRecovery errors,
> then. That you're able to do 250,000 messages without one, though,
> suggests that it's not something that 'just happens' as part of
> regular bsddb usage (unless it's something that happens every x days,
> or something hideous like that).
> 
> Hopefully someone else can provoke the hammer.py script to fail; I'm
> out of ideas.

I wrote a slightly different test script to pound BerkelyDB directly
without going through any SpamBayes code.  It opens the database using
hashopen() and shelve.Shelf() just like SpamBayes does.  It then sits in
a loop and updates the value of one of 10 random keys with a tuple that
contains a string process id passed on the command line and the current
date/time, and then does a sync().  The use of only 10 keys is intended
to produce as much contention as possible.  After the update/sync, 10%
of the time it will close and re-open the database, and 10% of the time
it will re-open the database *without* closing.  It's pretty simplistic,
but I've attached a copy in case anyone wants to review it or try it
out.

I ran the script in 5 simultaneous processes over the weekend.  Each
process reached approx 400,000 iterations, but then all 5 processes
crashed (with a Windows fault, not a traceback) and the database now
appears to be corrupt.  When I try to restart the test script, I get a
"memory could not be read" fault.  Oddly, it sometimes gets through 5 or
10 iterations before dying, so maybe only certain records are corrupt.

I'm going to investigate the state of the database file further using
the Berkeley utilities.  I'll report back if I uncover anything
interesting.

-- 
Kenny Pitt
-------------- next part --------------
"""Usage: %(program)s PROCESSID DBNAME

Where:
    PROCESSID
        A string that identifies which process is hammering the database.
        This string is included in the value tuple written to each record to
        identify which process last updated that record.
    DBNAME
        The filename of the BerkeleyDB database file to hammer.
"""

import sys
import os
import bsddb
import shelve
import random
from datetime import datetime

program = os.path.basename(sys.argv[0])
randkeys = (
    'a',
    'bb',
    'ccc',
    'dddd',
    'eeeee',
    'ffffff',
    'ggggggg',
    'hhhhhhhh',
    'iiiiiiiii',
    'jjjjjjjjjj')

def usage(code, msg=''):
    """Print usage message and sys.exit(code)."""
    if msg:
        print >> sys.stderr, msg
        print >> sys.stderr
    print >> sys.stderr, __doc__ % globals()
    sys.exit(code)

def main():
    if len(sys.argv) != 3:
        usage(1, "Incorrect number of parameters")

    process_id = sys.argv[1]
    db_name = sys.argv[2]

    print process_id, db_name

    dbm = bsddb.hashopen(db_name)
    db = shelve.Shelf(dbm)

    iterCount = 0
    closeCount = 0
    openCount = 0

    while True:
        r = random.randint(0, 9)
        key = randkeys[r]
        val = (process_id, datetime.now())
        db[key] = val
        db.sync()

        # If r is 0, close and re-open the database.  If r is 9, re-open the
        # database without closing it.
        if r == 0:
            # close and re-open the database
            db.close()
            closeCount += 1
        if (r == 0) or (r == 9):
            dbm = bsddb.hashopen(db_name)
            db = shelve.Shelf(dbm)
            openCount += 1

        iterCount += 1
        print("%s: iter=%d, close=%d, open=%d" % (process_id, iterCount, closeCount, openCount))

if __name__ == "__main__":
    main()