[Mailman-Users] Importing archives again

Mark Sapiro msapiro at value.net
Fri Apr 20 21:28:32 CEST 2007


Ivan Van Laningham wrote:

>Hi All--
>This is very helpful.  What I have are basically three sets of archives.
>
>1)  Archives from the current list, fairly small and created about two
>months ago after a disastrous ISP debacle (the yo-yos got themselves
>_evicted_, for heaven's sake);
>
>2)  Archives from the previous host and list incarnation and a much
>earlier version--but still > 2.0--of Mailman;
>
>3)  Archives from the previous host, same list, but a version of
>Mailman that might have started with the digit one. ;-)  The person
>who upgraded Mailman in Feb 2002 didn't bother to import the existing
>archives, so now is the first time I've tried to import such old
>archives.
>
>I have successfully dealt with 1 and 2.  Appending the two mboxes
>works well, probably because there is a two-week gap between the two
>latest incarnations of the list.
>
>However, 3 is a problem.  I don't have an mbox for the earliest
>archives; instead, I have the text files--2002-February.txt,
>etc.--which appear to me to be in mbox format.


The .txt files are similar to .mbox files, but there are various
differences. Many headers have been removed and, most importantly,
email addresses may have been obscured by changing user at example.com to
user at example.com.


>If I run cleanarch on these text files before running arch on them,
>they do not appear in the archives.


Probably because cleanarch escapes all the "From " separators because
the email address has " at " instead of "@".


>If I skip cleanarch, then I get
>bad addresses in the posts in the archives (and yes, I did use the
>--wipe option).  The bad addresses look like the following in the
>index page:
>
>[Sangha] Anger and its expression   Ryunyokingryunyo at earthlink.net
>
>The address is supposed to be "ryunyo at earthlink.net".


No, it is supposed to be "ryunyo at earthlink.net".


>How can I preprocess the text files to fix the problem addresses?  I
>assume it's because the old text files have something like From:
>Ryunyo King<"ryunyo at earthlink.com"> in the from line.  Is there a
>secret option to cleanarch I didn't see?


cleanarch won't do this. You need to process the .txt files your self
with your own script or by hand to replace " at " with "@" in email
addresses before using cleanarch.

Obviously, you can't just globally replace " at " with "@" as there
will be many occurrences of " at " outside email addresses.

You might limit your self to "From " lines and From: headers. That will
probably work. You could also try to use some regexp that only matches
" at " if it looks like it's in an email address.


>(I also ended up with a slew of duplicates when the upgrade happened
>in Feb 2002; half the messages are right, the other half of the
>duplicate messages have addresses similar to the above.  But I'm
>pretty sure I can deal with those.)
>
>Thanks for all the help.
>
>Metta,
>Ivan

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list