[spambayes-dev] Speedup for full retrain when using DB dict
Skip Montanaro
skip at pobox.com
Wed Sep 3 17:21:47 EDT 2003
My earlier message (which because of the mail load on mail.python.org you
will probably get after this one) indicated that I had a patch which might
speed up full retrains when using a shelve database. I'm happy to say it
works well for me. The test I ran essentially executed
rm hammie.db
hammie.py -d -p hammie.db -g newham.clean -s newspam.clean
between calls to the Unix date(1) program. The above two files contained a
total of 15720 messages. The full retrain time dropped from about 33
minutes to about 20 minutes. The speedup comes from not writing to the
shelve until until the training is completed. The context diff is attached.
Skip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: storage.diff
Type: application/octet-stream
Size: 1428 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030903/ff5397fa/storage.obj
More information about the spambayes-dev
mailing list