Mailman 3 sitemap.xml for Mailman - Mailman-Users

newer
Content Filtering

sitemap.xml for Mailman

older
NNTP Mass catch up

Tomasz Chmielewski

Nov. 21, 2012

5:01 p.m.

Hi,

I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):

http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/

My rationale was: why the heck is my archive lacking so many posts in google, and is generally "behind" all other archives.

-- Tomasz Chmielewski

Show replies by date

Mark Sapiro

November 2012

1:36 p.m.

Tomasz Chmielewski wrote:

...

I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):

http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/

There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that --- sitemap.old 2012-11-22 10:02:51.000000000 -0800 +++ sitemap.new 2012-11-22 10:27:37.000000000 -0800 @@ -18,8 +18,10 @@ set -u # find html files with their dates +URLS="" for LIST in $LISTS; do - URLS=$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments) + URLS="$URLS +$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)" done # if the article is crawled once a month, it should be enough -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Tomasz Chmielewski

2:37 p.m.

On 11/22/2012 08:36 PM, Mark Sapiro wrote:

...

Tomasz Chmielewski wrote:

...
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):

http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/

There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that

...

URLS="$URLS +$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)"

Indeed - thanks for pointing out!

-- Tomasz Chmielewski http://blog.wpkg.org

Ralf Hildebrandt

3:16 p.m.

Tomasz Chmielewski <mangoo@wpkg.org>:

...

On 11/22/2012 08:36 PM, Mark Sapiro wrote:

...
Tomasz Chmielewski wrote:

...
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):

http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/

There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that

...

URLS="$URLS +$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)"

Indeed - thanks for pointing out!

Did you update your patch already? If so, I'd apply it to our very own mailman installation here at python.org :)

-- Ralf Hildebrandt Charite Universitätsmedizin Berlin ralf.hildebrandt@charite.de Campus Benjamin Franklin http://www.charite.de Hindenburgdamm 30, 12203 Berlin Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155

Tomasz Chmielewski

3:21 p.m.

On 11/22/2012 10:16 PM, Ralf Hildebrandt wrote:

...

...
Indeed - thanks for pointing out!

Did you update your patch already? If so, I'd apply it to our very own mailman installation here at python.org :)

Yes I did (with a small change).

-- Tomasz Chmielewski http://blog.wpkg.org

Mark Sapiro

6:25 p.m.

Tomasz Chmielewski wrote:

...

I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):

http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/

I was inspired by your script to add a -p / --public-archive option to Mailman's bin/list_lists so that if one wanted to process just those lists with public archives, one can put

LISTS=/path/to/mailman/bin/list_lists --bare --public-archive

in the appropriate place to do that. Of course, one could have done

LISTS=ls $MAILMANPATH/../public|grep -v "\\.mbox$"

to accomplish essentially the same thing, but since list_lists already had a -a / --advertised option, this seemed a good addition.

See <https://launchpad.net/bugs/1082711> and <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1369> for a bug report and patch.

Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Tomasz Chmielewski

7:16 p.m.

On 11/25/2012 01:25 AM, Mark Sapiro wrote:

...

Tomasz Chmielewski wrote:

...
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):

http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/

I was inspired by your script to add a -p / --public-archive option to Mailman's bin/list_lists so that if one wanted to process just those lists with public archives, one can put

LISTS=/path/to/mailman/bin/list_lists --bare --public-archive

in the appropriate place to do that. Of course, one could have done

LISTS=ls $MAILMANPATH/../public|grep -v "\\.mbox$"

to accomplish essentially the same thing, but since list_lists already had a -a / --advertised option, this seemed a good addition.

See <https://launchpad.net/bugs/1082711> and <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1369> for a bug report and patch.

Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?

Sure, it would be great.

-- Tomasz Chmielewski http://blog.wpkg.org

Ralf Hildebrandt

4:15 a.m.

...

Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?

Personally, I'd say yes. It adds and doesn't change any existing functionality for the benefit of all. So why not add it?

Ralf Hildebrandt Charite Universitätsmedizin Berlin ralf.hildebrandt@charite.de Campus Benjamin Franklin http://www.charite.de Hindenburgdamm 30, 12203 Berlin Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155

4473

Age (days ago)

4477

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Mark Sapiro
Ralf Hildebrandt
Tomasz Chmielewski

sitemap.xml for Mailman

Tomasz Chmielewski

Mark Sapiro

Tomasz Chmielewski

Ralf Hildebrandt

Tomasz Chmielewski

Mark Sapiro

Tomasz Chmielewski

Ralf Hildebrandt

Personally, I'd say yes. It adds and doesn't change any existing functionality for the benefit of all. So why not add it?

tags

participants (3)