[Mailman-Users] Migrating from YahooGroups to Mailman

Greg Ward gward at mems-exchange.org
Tue Jul 31 22:27:21 CEST 2001


On 31 July 2001, Sarah K. Miller said:
> We're migrating some lists form YahooGroups to Mailman. Does anyone
> know of a way to automatically "slurp" all the messages off Yahoo and
> plop them into Mailman? The only way I've found to do it and retain
> the original information is cut and paste each one
> individually. That's a little overwhelming when you're looking at
> 1500+ messages. Yahoo was no help at all. If anybody here knows of a
> utility of some sort that would do it, please share!

I had a similar problem getting a list off of ListBot recently.  It only
had 19 messages in the archive (I was just saving them for posterity --
the list wasn't exactly a big hit), so it wasn't too bad.  Out of
principle, though, I automated the procedure a little bit.  The only
reason it was possible is that ListBot had a link to get the full
headers as plain text (wrapped in <PRE> in the web page, of course) for
each message.  After I figured out the pattern for that URL, I did
something like this:

for i in 1 2 3 ... 19 ; do       # yes, you have to type them all out
  GET http://www.listbot.com/(hairy url with $i in it somewhere) > msg$i.txt
  fix_msg msg$i.txt
done

GET is the alias for lwp-request installed by lwp (libwww-perl).  It
just does an HTTP request from the shell.  Handy, but it would be easy
to whip up something similar in Python (which I've been meaning to do
for a while now...).

fix_msg was a little Perl script I wrote to undo HTML encodings in the
not-quite-plain-text file downloaded from listbot.com.  I don't seem to
have it anymore; it went something like this:

  #!/usr/bin/perl -p

  s/&amp;/&/g;
  s/&quot;/"/g;
  # ...etc...

Again, you could do this pretty easily in Python, but why bother?  Perl
is perfect for this sort of hackery.  It all depends on the text Yahoo
presents you with; mine was similarly dependent on ListBot.

To glue the messages together into a legitimate mbox (which you can
build into an archive with some Mailman tool... ummm... bin/arch
maybe?), you need to make sure each msg*.txt file starts with a "From "
line and ends with a blank line.  formail, a tool supplied with
procmail, will take care of the former.  Or you could DIY in that
mythical fix_msg script (which would also be a good place to ensure a
trailing blank line, although I think you don't want one on the last
message...)

You should also systematically rename the files from eg. msg1.txt to
msg0001.txt so the next command works.  Again, child's play if you know
your way around Unix and Perl...

Finally, "cat msg*.txt > mylist.mbox" (or whatever) and run Mailman's
archive tool on it.

Make sure you've created a workable mbox file by running your favourite
mail client on it, eg. "mutt -f mylist.mbox".

Anyways, if you're familiar with Unix and the tools available to you,
this should be doable... as long as Yahoo makes the full original text
of the messages available to you!  If not, all is lost, give up, doom,
failure, etc.  If I was in your shoes (1500 messages to process), I'd
probably take a few hours and write a nice Python script to do it right.
For the 19 messages I had to do, crude shell-and-Perl hackery was just
fine.

        Greg
-- 
Greg Ward - software developer                gward at mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org




More information about the Mailman-Users mailing list