
After moving the list to it's new home and running the script to update the archive, I ended up with a raft of messages in the January 2007 archive that are probably ancient. They show no subject, and all of them are dated this afternoon, probably at the time that I ran the script. Is there any safe way to clear those out?
Van
--
Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For photography, web design, hosting, and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/

At 12:37 AM -0800 1/30/07, G. Armour Van Horn wrote:
After moving the list to it's new home and running the script to update the archive, I ended up with a raft of messages in the January 2007 archive that are probably ancient. They show no subject, and all of them are dated this afternoon, probably at the time that I ran the script. Is there any safe way to clear those out?
That's an artifact of the date-correction routines in the pipermail archiving system that was integrated into Mailman. I've got an item on the request list to move the date correction routines into Mailman itself, right after message reception. That way the messages get corrected once on input and then never need to get corrected again, and you don't wind up with the situation that you have now.
I don't know that there's any way to fix this problem with the current code, short of going in and manually editing the raw mailbox to correct the date header (and any other headers that need fixing), and then completely re-generating your archives.
-- Brad Knowles <brad@shub-internet.org>, Consultant & Author Co-author of SAGE Booklet #15 "Internet Postmaster: Duties and Responsibilities" Founding Member and Platinum Individual Sponsor of LOPSA: <http://www.lopsa.org> Papers: <http://tinyurl.com/tj6q4> LinkedIn Profile: <http://tinyurl.com/y8kpxu>

Quoting G. Armour Van Horn (vanhorn@whidbey.com):
After moving the list to it's new home and running the script to update the archive, I ended up with a raft of messages in the January 2007 archive that are probably ancient. They show no subject, and all of them are dated this afternoon, probably at the time that I ran the script. Is there any safe way to clear those out?
That happened to me when I moved my archives because I had old messages that had an "unescaped" "From " line in the body. I guess there was a time when pipermail didn't put a ">" in front of the word "From " in the body of a message, and so when I ran "arch" on that mbox I got a lot of gibberish messages dated today. The user contributed program "cleanarch" can help fix up some (but not all) of those and I had to use sed to fix the rest. Another problem I ran into were some messages that came around 1 Jan 2000 that had a date of 1 Jan 100. I also discovered some very old messages that had a header line of Content-Type: TEXT/PLAIN; charset=".chrsc" which confused arch as well. It wasn't until I fixed all of these problems that I was able to finally run arch in a way that built good archives.
-- Paul Tomblin <ptomblin@xcski.com> http://blog.xcski.com/ Don't you just hate them? Don't you just wanna break their ribs, cut their backs open and pull their lungs out from behind? -- Ina Faye-Lund, on script kiddies

Paul Tomblin wrote:
Quoting G. Armour Van Horn (vanhorn@whidbey.com):
After moving the list to it's new home and running the script to update the archive, I ended up with a raft of messages in the January 2007 archive that are probably ancient. They show no subject, and all of them are dated this afternoon, probably at the time that I ran the script. Is there any safe way to clear those out?
That happened to me when I moved my archives because I had old messages that had an "unescaped" "From " line in the body. I guess there was a time when pipermail didn't put a ">" in front of the word "From " in the body of a message, and so when I ran "arch" on that mbox I got a lot of gibberish messages dated today. The user contributed program "cleanarch" can help fix up some (but not all) of those and I had to use sed to fix the rest. Another problem I ran into were some messages that came around 1 Jan 2000 that had a date of 1 Jan 100. I also discovered some very old messages that had a header line of Content-Type: TEXT/PLAIN; charset=".chrsc" which confused arch as well. It wasn't until I fixed all of these problems that I was able to finally run arch in a way that built good archives.
That sounds ugly, given that the mbox file is over 20 megs. I guess I'll try cleanarch and ignore the rest of it, as I lack the skill (or patience) to find and repair the errors in the mbox. But thanks for the lead.
Van
--
Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For photography, web design, hosting, and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/

At 3:33 PM -0800 1/30/07, G. Armour Van Horn wrote:
That sounds ugly, given that the mbox file is over 20 megs. I guess I'll try cleanarch and ignore the rest of it, as I lack the skill (or patience) to find and repair the errors in the mbox. But thanks for the lead.
Try doing this kind of thing on a mailbox that is almost 2GB in size, and part of your problem is that the machine it's on doesn't support large files, and has only 1GB of RAM.
Large mailbox files are a serious pain.
-- Brad Knowles <brad@shub-internet.org>, Consultant & Author Co-author of SAGE Booklet #15 "Internet Postmaster: Duties and Responsibilities" Founding Member and Platinum Individual Sponsor of LOPSA: <http://www.lopsa.org> Papers: <http://tinyurl.com/tj6q4> LinkedIn Profile: <http://tinyurl.com/y8kpxu>

Paul Tomblin wrote:
Quoting G. Armour Van Horn (vanhorn@whidbey.com):
After moving the list to it's new home and running the script to update the archive, I ended up with a raft of messages in the January 2007 archive that are probably ancient. They show no subject, and all of them are dated this afternoon, probably at the time that I ran the script. Is there any safe way to clear those out?
That happened to me when I moved my archives because I had old messages that had an "unescaped" "From " line in the body. I guess there was a time when pipermail didn't put a ">" in front of the word "From " in the body of a message, and so when I ran "arch" on that mbox I got a lot of gibberish messages dated today. The user contributed program "cleanarch" can help fix up some (but not all) of those and I had to use sed to fix the rest. Another problem I ran into were some messages that came around 1 Jan 2000 that had a date of 1 Jan 100. I also discovered some very old messages that had a header line of Content-Type: TEXT/PLAIN; charset=".chrsc" which confused arch as well. It wasn't until I fixed all of these problems that I was able to finally run arch in a way that built good archives.
I spoke too soon. I got a lot of this:
#Unix-From line changed: 175609 From the wire service copy: #######Unix-From line changed: 176324 From the MM press release: ##########################Unix-From line changed: 178901 From a designers view I think FW is the most powerful tool. I designed ######Unix-From line changed: 179571 From my web site: Unix-From line changed: 179573 From my experience, there is no specific palette grouping that causes Pal to
(I had used the "-s 100" option to output a # every hundred lines.) Every case cleanarch came upon was a valid bit of text inside a message. Then I went and looked at the actual output, and saw that cleanarch had prepended a ">" to the lines that were part of running text, so I renamed files so the output from cleanarch was the live file and ran arch again.
I think it may have made things worse, it looks like the same messages that were there before still ended up in the January archive. They still have date tags based on the time of running arch for the first time on the new machine yesterday afternoon. These dates are not found in the mbox file.
Looking at the messages in the January archive, it looks like there are only about 25 messages, not really a huge task to go back and repair manually. The question then becomes, what do I need to do to the mbox file so that arch will know where to actually break things, and do I need to do anything special to make sure that the messed up archive elements are no longer present?
Van
--
Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For photography, web design, hosting, and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/

G. Armour Van Horn wrote:
I spoke too soon. I got a lot of this:
#Unix-From line changed: 175609 From the wire service copy: #######Unix-From line changed: 176324 From the MM press release: ##########################Unix-From line changed: 178901 From a designers view I think FW is the most powerful tool. I designed ######Unix-From line changed: 179571 From my web site: Unix-From line changed: 179573 From my experience, there is no specific palette grouping that causes Pal to
(I had used the "-s 100" option to output a # every hundred lines.)
This is normal output from cleanarch doing what it is supposed to do, Namely prepending '>' to lines that begin with 'From ' that don't look like Unix mbox message separators
Every case cleanarch came upon was a valid bit of text inside a message. Then I went and looked at the actual output, and saw that cleanarch had prepended a ">" to the lines that were part of running text, so I renamed files so the output from cleanarch was the live file and ran arch again.
So far so good.
I think it may have made things worse, it looks like the same messages that were there before still ended up in the January archive. They still have date tags based on the time of running arch for the first time on the new machine yesterday afternoon. These dates are not found in the mbox file.
Did you remember the --wipe option when you reran bin/arch?
Looking at the messages in the January archive, it looks like there are only about 25 messages, not really a huge task to go back and repair manually. The question then becomes, what do I need to do to the mbox file so that arch will know where to actually break things, and do I need to do anything special to make sure that the messed up archive elements are no longer present?
First make sure you've run 'bin/arch --wipe' with the cleanarch'd .mbox.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Brad Knowles
-
G. Armour Van Horn
-
Mark Sapiro
-
Paul Tomblin