
Hi,
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/
My rationale was: why the heck is my archive lacking so many posts in google, and is generally "behind" all other archives.

Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that
--- sitemap.old 2012-11-22 10:02:51.000000000 -0800 +++ sitemap.new 2012-11-22 10:27:37.000000000 -0800 @@ -18,8 +18,10 @@ set -u
# find html files with their dates +URLS="" for LIST in $LISTS; do
- URLS=$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls
--time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)
- URLS="$URLS
+$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)" done
# if the article is crawled once a month, it should be enough

On 11/22/2012 08:36 PM, Mark Sapiro wrote:
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that
- URLS="$URLS
+$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)"
Indeed - thanks for pointing out!

- Tomasz Chmielewski mangoo@wpkg.org:
On 11/22/2012 08:36 PM, Mark Sapiro wrote:
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that
- URLS="$URLS
+$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)"
Indeed - thanks for pointing out!
Did you update your patch already? If so, I'd apply it to our very own mailman installation here at python.org :)

On 11/22/2012 10:16 PM, Ralf Hildebrandt wrote:
Indeed - thanks for pointing out!
Did you update your patch already? If so, I'd apply it to our very own mailman installation here at python.org :)
Yes I did (with a small change).

Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
I was inspired by your script to add a -p / --public-archive option to Mailman's bin/list_lists so that if one wanted to process just those lists with public archives, one can put
LISTS=/path/to/mailman/bin/list_lists --bare --public-archive
in the appropriate place to do that. Of course, one could have done
LISTS=ls $MAILMANPATH/../public|grep -v "\\.mbox$"
to accomplish essentially the same thing, but since list_lists already had a -a / --advertised option, this seemed a good addition.
See https://launchpad.net/bugs/1082711 and http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1369 for a bug report and patch.
Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?

On 11/25/2012 01:25 AM, Mark Sapiro wrote:
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
I was inspired by your script to add a -p / --public-archive option to Mailman's bin/list_lists so that if one wanted to process just those lists with public archives, one can put
LISTS=
/path/to/mailman/bin/list_lists --bare --public-archive
in the appropriate place to do that. Of course, one could have done
LISTS=
ls $MAILMANPATH/../public|grep -v "\\.mbox$"
to accomplish essentially the same thing, but since list_lists already had a -a / --advertised option, this seemed a good addition.
See https://launchpad.net/bugs/1082711 and http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1369 for a bug report and patch.
Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?
Sure, it would be great.

Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?
Personally, I'd say yes. It adds and doesn't change any existing functionality for the benefit of all. So why not add it?
participants (3)
-
Mark Sapiro
-
Ralf Hildebrandt
-
Tomasz Chmielewski