[Spambayes] Training corrupts mbox files

Neale Pickett neale at woozle.org
Tue Apr 29 19:59:56 EDT 2003


David McLaughlin <david at dsmcl.net> writes:

> Thanks for taking a look at it!
>
> I have put a sample before and after mbox at the following location:
>
> ftp://ftp.dsmcl.net/spambayes_samplembox.tgz
>
> It looks like it may be duplicating some lines in the header, and
> adding an extra line break, which generates "extra" bogus mail
> messages.

Yeah, sure enough.  You're using mutt?

The mbox "standard" is that any line beginning with "From " denotes a
new messages.  So a diff of those two mailboxes shows things like this:

 From removed at example.com Mon Apr 28 16:37:18 2003
 Return-Path: <removed>
 Delivered-To: <removed>
+X-Spambayes-Trained: spam
+
 From removed at example.com Mon Apr 28 16:37:18 2003
 Return-Path: <removed>
 Delivered-To: <removed>

I think spambayes is actually doing the right thing here--it's taking a
weird mbox and un-weirding it.  I think Tim Stone might be working on a
generic message store thingy: Tim, would that eliminate the need to
rewrite mailboxes altogether?

But David, if I were you I'd start trying to hunt down what's creating
those duplicate headers.  It might be some sort of wonky procmail recipe
that just writes out headers and then drops through, but that's just a
shot in the dark guess.  Heh, maybe it's hammiefilter <0.7 wink>

Neale




More information about the Spambayes mailing list