[Mailman-Users] "No subject" messages in archives
Ivan Van Laningham
ivanlan at pauahtun.org
Mon May 21 15:12:13 CEST 2007
Hi All--
Mark Sapiro wrote:
> Ivan Van Laningham wrote:
>> I ran cleanarch, yes, but all it did was to escape every single "From "
>> line, which would make arch think there was only one message.
>
>
>
> Then either the From line doesn't match the pattern
> mailbox.UnixMailbox._fromlinepattern or it is not followed immediately
> (with no intervening lines or maybe even '\r') by a line that looks
> like a message header.
>
> If there is intervening whitespace between the "From " line and the
> message headers, that may cause the spurious archived empty messages.
>
Ah. Now we're getting somewhere. Here are some sample "From " lines:
1) From the current list.mbox (leading '> ' not part of actual line):
> From Lizzelvin at aol.com Sun Mar 18 18:17:56 2007
2) From the old mbox which I want to incorporate (leading '> ' inserted):
> From "robyn m. fritz" <rfritz at nwlink.com>
or
> From Mochie at webtv.net (C Ryplansky)
And here is the _fromlinepattern:
_fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$"
Now, I don't understand much of this pattern, but it looks to me as if
a) there's no provision for matching " or < or > characters; and
b) some sort of date/time mark is required.
All the "From " lines are terminated with a \n, and all are followed
immediately by what look like valid message header lines, so I don't
think those are problems. There do appear to be 1006 unescaped "From "
lines in the old mbox:
$ grep '^From ' guppies-out.mbox | wc
46295 163728 1800087
$ grep '^From: ' guppies-out.mbox | wc
45289 159710 1803623
So, if I process the old mbox and convert the "From " lines without
dates into "From " lines without " and <> and add a date/time stamp, and
THEN run cleanarch, cleanarch should escape only the 1006 non-matching
"From " lines, and I should end up with an mbox I can combine with
March, April and May of 2007 from the current list. Is that a correct
assessment?
Metta,
Ivan
--
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.python.org/workshops/1998-11/proceedings/papers/laningham/laningham.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
More information about the Mailman-Users
mailing list