no subjects in pipermail archive listing
![](https://secure.gravatar.com/avatar/0cbc1b03eaf8d8ca66d25bfbe931993d.jpg?s=120&d=mm&r=g)
Recently several of the mailing lists we run have developed a strange problem when generating the pipermail web archive pages. For a period of about ten days or so, all mail messages posted to the mailing list appear in the web archive with a subject of "No subject".
When I view the actual messages through the web, they're not complete - in most cases the first few lines of the message are missing, and in some cases the entire message is.
If I copy the relevent mbox for the list and open it using my mail client (mutt), it opens fine and it appears as if all the mail messages are complete. mutt indexes it as I remember the actual postings being, complete with subjects. The same is true of all lists that have this problem.
I suspected that the indexes were corrupt, and so I ran arch --wipe on the list to rebuild the archive in the hopes that it would fix the problem. Unfortunately it didn't.
We're running mailman 2.1.3 with the July 2003 RSS feed patches from http://sourceforge.net/tracker/index.php?func=detail&aid=657951&group_id=103&atid=300103
Any help or suggestions people might give that will help me regenerate the archive indexes properly would be appreciated.
guy
Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa Email: G.Halse@ru.ac.za Web: http://mombe.org/ IRC: rm-rf@irc.zanet.net *** ANSI Standard Disclaimer *** J.A.P.H
![](https://secure.gravatar.com/avatar/0cbc1b03eaf8d8ca66d25bfbe931993d.jpg?s=120&d=mm&r=g)
It seems that the problem is more interesting that I originally thought. What's happening is that arch isn't properly parsing the mbox, and is taking paragraphs that begin with a capitalised "From" to be be new mail messages (ie, it sees the "From" as a mbox From_ line).
The no subject header comes because the messages aren't complete messages, they're just paragraphs that happen to start with the word "From", for example "From the Cisco Report ( 20h00 -> 22h00 )[2]". It only happens if the "F" is capitalised and "From" is the first word after a blank line.
mutt somehow correctly interprets the "From" as part of the text, not an mbox From_ line. I'm presuming that it uses regex to match the envelope.
I'm not sure how lines begining with "From" that aren't From_ lines are supposed to be quoted, and whether it is mailman or exim (our MTA) that is supposed to be doing the quoting.
It seems that this only started happening when we upgraded to mailman 2.1.3, did anything change in the way it handled mboxes?
Guy
Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa Email: G.Halse@ru.ac.za Web: http://mombe.org/ IRC: rm-rf@irc.zanet.net *** ANSI Standard Disclaimer *** J.A.P.H
![](https://secure.gravatar.com/avatar/9bcefaaa25bd7dd06ac8e92e31ae5555.jpg?s=120&d=mm&r=g)
24-Dec-03 at 12:29, Guy Antony Halse (g.halse@ru.ac.za) wrote :
They are supposed to be quoted as
From
However there should be a way to parse the lines a bit better. The From lines usually contain a date (should be an RFC formatted date string, don't remember which RFC) and so on.
Quick workaround : use an external archiver like MHonArc, which works wonderfully for me, and handles strange character quoting and HTML messages better than pipermail...
-- Simon White. Internet Consultant, Linux/Windows Server Administration. email, dns and web servers; php javascript perl asp; MySQL MSSQL Access Bridging the gap between management, HR and the tech team.
![](https://secure.gravatar.com/avatar/0cbc1b03eaf8d8ca66d25bfbe931993d.jpg?s=120&d=mm&r=g)
On Wed 2003-12-24 (11:33), Simon White wrote:
I found a reasonably simple solution to this, although it is probably somewhat of a hack. I've replaced references to PortableUnixMailbox in Mailman/Mailbox.py with UnixMailbox.
Looking at the source code for the python mailbox modules, PortableUnixMailbox simply checks that the first five characters of the line are 'From ', while UnixMailbox does a long complicated regex.
This solution might not be portable (as its name suggests), but it seems to work for our MTA (exim). At least I don't have (no subject) in my archives any more.
Guy
Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa Email: G.Halse@ru.ac.za Web: http://mombe.org/ IRC: rm-rf@irc.zanet.net *** ANSI Standard Disclaimer *** J.A.P.H
![](https://secure.gravatar.com/avatar/0cbc1b03eaf8d8ca66d25bfbe931993d.jpg?s=120&d=mm&r=g)
It seems that the problem is more interesting that I originally thought. What's happening is that arch isn't properly parsing the mbox, and is taking paragraphs that begin with a capitalised "From" to be be new mail messages (ie, it sees the "From" as a mbox From_ line).
The no subject header comes because the messages aren't complete messages, they're just paragraphs that happen to start with the word "From", for example "From the Cisco Report ( 20h00 -> 22h00 )[2]". It only happens if the "F" is capitalised and "From" is the first word after a blank line.
mutt somehow correctly interprets the "From" as part of the text, not an mbox From_ line. I'm presuming that it uses regex to match the envelope.
I'm not sure how lines begining with "From" that aren't From_ lines are supposed to be quoted, and whether it is mailman or exim (our MTA) that is supposed to be doing the quoting.
It seems that this only started happening when we upgraded to mailman 2.1.3, did anything change in the way it handled mboxes?
Guy
Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa Email: G.Halse@ru.ac.za Web: http://mombe.org/ IRC: rm-rf@irc.zanet.net *** ANSI Standard Disclaimer *** J.A.P.H
![](https://secure.gravatar.com/avatar/9bcefaaa25bd7dd06ac8e92e31ae5555.jpg?s=120&d=mm&r=g)
24-Dec-03 at 12:29, Guy Antony Halse (g.halse@ru.ac.za) wrote :
They are supposed to be quoted as
From
However there should be a way to parse the lines a bit better. The From lines usually contain a date (should be an RFC formatted date string, don't remember which RFC) and so on.
Quick workaround : use an external archiver like MHonArc, which works wonderfully for me, and handles strange character quoting and HTML messages better than pipermail...
-- Simon White. Internet Consultant, Linux/Windows Server Administration. email, dns and web servers; php javascript perl asp; MySQL MSSQL Access Bridging the gap between management, HR and the tech team.
![](https://secure.gravatar.com/avatar/0cbc1b03eaf8d8ca66d25bfbe931993d.jpg?s=120&d=mm&r=g)
On Wed 2003-12-24 (11:33), Simon White wrote:
I found a reasonably simple solution to this, although it is probably somewhat of a hack. I've replaced references to PortableUnixMailbox in Mailman/Mailbox.py with UnixMailbox.
Looking at the source code for the python mailbox modules, PortableUnixMailbox simply checks that the first five characters of the line are 'From ', while UnixMailbox does a long complicated regex.
This solution might not be portable (as its name suggests), but it seems to work for our MTA (exim). At least I don't have (no subject) in my archives any more.
Guy
Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa Email: G.Halse@ru.ac.za Web: http://mombe.org/ IRC: rm-rf@irc.zanet.net *** ANSI Standard Disclaimer *** J.A.P.H
participants (2)
-
Guy Antony Halse
-
Simon White