[Mailman-Users] "No subject" messages in archives

Ivan Van Laningham ivanlan at pauahtun.org
Mon May 21 15:12:13 CEST 2007


Hi All--

Mark Sapiro wrote:
> Ivan Van Laningham wrote:
>> I ran cleanarch, yes, but all it did was to escape every single "From " 
>> line, which would make arch think there was only one message.
> 
> 
> 
> Then either the From line doesn't match the pattern
> mailbox.UnixMailbox._fromlinepattern or it is not followed immediately
> (with no intervening lines or maybe even '\r') by a line that looks
> like a message header.
> 
> If there is intervening whitespace between the "From " line and the
> message headers, that may cause the spurious archived empty messages.
> 

Ah.  Now we're getting somewhere.  Here are some sample "From " lines:

1)  From the current list.mbox (leading '> ' not part of actual line):
 > From Lizzelvin at aol.com Sun Mar 18 18:17:56 2007
2)  From the old mbox which I want to incorporate (leading '> ' inserted):
 > From "robyn m. fritz" <rfritz at nwlink.com>
or
 > From Mochie at webtv.net (C Ryplansky)

And here is the _fromlinepattern:

_fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
                    r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$"

Now, I don't understand much of this pattern, but it looks to me as if 
a) there's no provision for matching " or < or > characters; and
b) some sort of date/time mark is required.

All the "From " lines are terminated with a \n, and all are followed 
immediately by what look like valid message header lines, so I don't 
think those are problems.  There do appear to be 1006 unescaped "From " 
lines in the old mbox:

$ grep '^From ' guppies-out.mbox | wc
    46295  163728 1800087
$ grep '^From: ' guppies-out.mbox | wc
    45289  159710 1803623

So, if I process the old mbox and convert the "From " lines without 
dates into "From " lines without " and <> and add a date/time stamp, and 
THEN run cleanarch, cleanarch should escape only the 1006 non-matching 
"From " lines, and I should end up with an mbox I can combine with 
March, April and May of 2007 from the current list.  Is that a correct 
assessment?

Metta,
Ivan
-- 
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.python.org/workshops/1998-11/proceedings/papers/laningham/laningham.html
Army Signal Corps:  Cu Chi, Class of '70
Author:  Teach Yourself Python in 24 Hours


More information about the Mailman-Users mailing list