data:image/s3,"s3://crabby-images/1ce21/1ce21818af726de48f3eaa36baed85644d80ddbd" alt=""
Hi,
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
http://blog.wpkg.org/2012/11/21/sitemap-xml-for-mailman/
My rationale was: why the heck is my archive lacking so many posts in google, and is generally "behind" all other archives.
-- Tomasz Chmielewski
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that --- sitemap.old 2012-11-22 10:02:51.000000000 -0800 +++ sitemap.new 2012-11-22 10:27:37.000000000 -0800 @@ -18,8 +18,10 @@ set -u # find html files with their dates +URLS="" for LIST in $LISTS; do - URLS=$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments) + URLS="$URLS +$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)" done # if the article is crawled once a month, it should be enough -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/1ce21/1ce21818af726de48f3eaa36baed85644d80ddbd" alt=""
On 11/22/2012 08:36 PM, Mark Sapiro wrote:
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that
- URLS="$URLS +$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)"
Indeed - thanks for pointing out!
-- Tomasz Chmielewski http://blog.wpkg.org
data:image/s3,"s3://crabby-images/aaf57/aaf572f6d110e3f5e181da5d1322a27383b2fd16" alt=""
- Tomasz Chmielewski <mangoo@wpkg.org>:
On 11/22/2012 08:36 PM, Mark Sapiro wrote:
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
There is an issue with the above script. Namely, the XML generated only contains data for the last list in $LISTS. This will fix that
- URLS="$URLS +$(find $MAILMANPATH/$LIST/ -type f -name \*html | xargs ls --time-style=long-iso -l | awk '{print $6"T"$7":00+00:00 "$8}' | grep -v attachments)"
Indeed - thanks for pointing out!
Did you update your patch already? If so, I'd apply it to our very own mailman installation here at python.org :)
-- Ralf Hildebrandt Charite Universitätsmedizin Berlin ralf.hildebrandt@charite.de Campus Benjamin Franklin http://www.charite.de Hindenburgdamm 30, 12203 Berlin Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155
data:image/s3,"s3://crabby-images/1ce21/1ce21818af726de48f3eaa36baed85644d80ddbd" alt=""
On 11/22/2012 10:16 PM, Ralf Hildebrandt wrote:
Indeed - thanks for pointing out!
Did you update your patch already? If so, I'd apply it to our very own mailman installation here at python.org :)
Yes I did (with a small change).
-- Tomasz Chmielewski http://blog.wpkg.org
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
I was inspired by your script to add a -p / --public-archive option to Mailman's bin/list_lists so that if one wanted to process just those lists with public archives, one can put
LISTS=/path/to/mailman/bin/list_lists --bare --public-archive
in the appropriate place to do that. Of course, one could have done
LISTS=ls $MAILMANPATH/../public|grep -v "\\.mbox$"
to accomplish essentially the same thing, but since list_lists already had a -a / --advertised option, this seemed a good addition.
See <https://launchpad.net/bugs/1082711> and <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1369> for a bug report and patch.
Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/1ce21/1ce21818af726de48f3eaa36baed85644d80ddbd" alt=""
On 11/25/2012 01:25 AM, Mark Sapiro wrote:
Tomasz Chmielewski wrote:
I wrote a simple bash script which generates a sitemap.xml file (i.e. to be submitted to Google):
I was inspired by your script to add a -p / --public-archive option to Mailman's bin/list_lists so that if one wanted to process just those lists with public archives, one can put
LISTS=
/path/to/mailman/bin/list_lists --bare --public-archive
in the appropriate place to do that. Of course, one could have done
LISTS=
ls $MAILMANPATH/../public|grep -v "\\.mbox$"
to accomplish essentially the same thing, but since list_lists already had a -a / --advertised option, this seemed a good addition.
See <https://launchpad.net/bugs/1082711> and <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1369> for a bug report and patch.
Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?
Sure, it would be great.
-- Tomasz Chmielewski http://blog.wpkg.org
data:image/s3,"s3://crabby-images/aaf57/aaf572f6d110e3f5e181da5d1322a27383b2fd16" alt=""
Also, I think the bash script would make a good addition to the contrib directory for Mailman 2.1.16. May I add it there?
Personally, I'd say yes. It adds and doesn't change any existing functionality for the benefit of all. So why not add it?
Ralf Hildebrandt Charite Universitätsmedizin Berlin ralf.hildebrandt@charite.de Campus Benjamin Franklin http://www.charite.de Hindenburgdamm 30, 12203 Berlin Geschäftsbereich IT, Abt. Netzwerk fon: +49-30-450.570.155
participants (3)
-
Mark Sapiro
-
Ralf Hildebrandt
-
Tomasz Chmielewski