[spambayes-dev] Speedup for full retrain when using DB dict

Skip Montanaro skip at pobox.com
Wed Sep 3 17:21:47 EDT 2003


My earlier message (which because of the mail load on mail.python.org you
will probably get after this one) indicated that I had a patch which might
speed up full retrains when using a shelve database.  I'm happy to say it
works well for me.  The test I ran essentially executed

    rm hammie.db
    hammie.py -d -p hammie.db -g newham.clean -s newspam.clean

between calls to the Unix date(1) program.  The above two files contained a
total of 15720 messages.  The full retrain time dropped from about 33
minutes to about 20 minutes.  The speedup comes from not writing to the
shelve until until the training is completed.  The context diff is attached.

Skip

-------------- next part --------------
A non-text attachment was scrubbed...
Name: storage.diff
Type: application/octet-stream
Size: 1428 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030903/ff5397fa/storage.obj


More information about the spambayes-dev mailing list