[Spambayes] Eliminating duplicates from mbox file

Skip Montanaro skip at pobox.com
Sat Mar 8 07:32:52 EST 2003


    Tim> Stick some prints in the code.  In the _handle_text() method, see
    Tim> whether this block is getting executed (it should be):

    Tim>         if self._mangle_from_:
    Tim>             payload = fcre.sub('>From ', payload)

Okay, I'll give that a try.  The reason I stuck in the replace() call was
that what it told me the number of messages was (len(d), where d is the dict
using md5 checksums as keys) differed from what "egrep '^From ' out" told me
after it had generated the output file (there were four more "^From " lines
than the number of messages in the dict).  Once I added the replace() call,
they agreed.  Given that, I think there's a bug without inserting prints.
(I had planned to submit a bug report today.)

Skip



More information about the Spambayes mailing list