Mark Sapiro wrote:
Ivan Van Laningham wrote:
I ran cleanarch, yes, but all it did was to escape every single "From " line, which would make arch think there was only one message.
Then either the From line doesn't match the pattern mailbox.UnixMailbox._fromlinepattern or it is not followed immediately (with no intervening lines or maybe even '\r') by a line that looks like a message header.
If there is intervening whitespace between the "From " line and the message headers, that may cause the spurious archived empty messages.
Ah. Now we're getting somewhere. Here are some sample "From " lines:
- From the current list.mbox (leading '> ' not part of actual line):
From Lizzelvin@aol.com Sun Mar 18 18:17:56 2007
- From the old mbox which I want to incorporate (leading '> ' inserted):
From "robyn m. fritz" email@example.com
From Mochie@webtv.net (C Ryplansky)
And here is the _fromlinepattern:
_fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+"
Now, I don't understand much of this pattern, but it looks to me as if a) there's no provision for matching " or < or > characters; and b) some sort of date/time mark is required.
All the "From " lines are terminated with a \n, and all are followed immediately by what look like valid message header lines, so I don't think those are problems. There do appear to be 1006 unescaped "From " lines in the old mbox:
$ grep '^From ' guppies-out.mbox | wc 46295 163728 1800087 $ grep '^From: ' guppies-out.mbox | wc 45289 159710 1803623
So, if I process the old mbox and convert the "From " lines without dates into "From " lines without " and <> and add a date/time stamp, and THEN run cleanarch, cleanarch should escape only the 1006 non-matching "From " lines, and I should end up with an mbox I can combine with March, April and May of 2007 from the current list. Is that a correct assessment?