[Spambayes] using binary pickles makes for much smaller databases
Skip Montanaro
skip at pobox.com
Sun Dec 8 03:00:01 EST 2002
I was messing around with various things today. One thing I tried is to
modify Python's shelve.py and Spambayes' storage.py to allow and use binary
pickles. Before:
-rw-rw-r-- 1 skip staff 20914176 Dec 7 18:20 hammie.db
After:
-rw-rw-r-- 1 skip staff 10874880 Dec 7 18:32 hammie.db
In both cases I trained 13144 hams and 6662 spams starting with no hammie.db
file. The databases each wound up with 324310 keys.
The times seemed about the same: 324.66user+62.30sys for the ascii version
and 322.89user+60.61sys for the binary version. The wall clock times
weren't comparable because I was doing other things as they ran.
Attached are diffs for Python's Lib/shelve.py and Spambayes' storage.py. I
believe they should both be backward compatible though I haven't tested it.
Let me know if you think they are reasonable changes.
Skip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shelve.diff
Type: application/octet-stream
Size: 1342 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20021207/4d69f2f0/shelve.exe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: storage.diff
Type: application/octet-stream
Size: 1078 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20021207/4d69f2f0/storage.exe
More information about the Spambayes
mailing list