
Hi,
Sometimes before subscribing to a list I like to download the archives and convert them into an mbox file for nice threaded browsing & searching using familiar tools (for me, Mutt and mairix). Not finding an automated way to do this [1], I put together the following shell script. Simple & rough, but seems to do the job:
#!/bin/sh # automated retrieval of pipermail archives & conversion to mbox file # Last edit: 2012/10/09 Tue 23:16 PDT listname=$(echo "$1" | sed 's:^\(http.*\)/\([^/]*\)/$:\2:') cd /tmp wget -r -l 1 -nH -A *.txt.gz "$1" touch /tmp/pipermail/$listname/$listname.mbox chmod 600 /tmp/pipermail/$listname/$listname.mbox cd /tmp/pipermail/$listname for f in $(ls |sort) do zcat $f | iconv -f iso8859-15 -t utf-8 | sed 's/\(^From.*\)\ at\ /\1@/' >> "$listname.mbox" done rm /tmp/pipermail/$listname/*.gz mutt -f /tmp/pipermail/$listname/$listname.mbox
I call this script piperget, and by doing:
piperget http://example.tld/pipermail/somelistname/
the file /tmp/pipermail/somelistname.mbox is created and opened by mutt. If I like what I see, I move the mbox file to an appropriate location in my Mail directory, subscribe to the list, and filter the list traffic into that mbox.
This could be made more robust and tweaked to better suit varying needs. Being able to specify a range of archive dates would be nice. Another thought is to have the option of leaving the last few *.txt.gz files laying around (somewhere other than in /tmp), checking against them to only wget new archives or an archive with a newer time-stamp, then concatenating newer messages onto the existing mbox. A sort of a pseudo-subscription to a list. Repeatedly re-downloading an entire monthly/quarterly archive as it changes would be rather bandwidth-wasteful though, better to subscribe and update the *.mbox via SMTP. Not sure if there's some rsync way to incrementally download only the parts of an archive that've changed... Anyhow, mostly I just use this to catch up on a list at the moment of deciding whether or not to subscribe to it. Any thoughts or suggestions are welcome.
[1] After writing this script I did find: https://github.com/wesleyd/pipermail-archive-to-maildir Which could be another option for those interested in the maildir format. I prefer mbox for mailing lists.
John
-- John Magolske http://B79.net/contact