
Hi Everyone,
I hope you had a good Christmas with friends and family.
We developed custom subscription pages for the mailing lists supported by our website. Although these pages show on our SERP listings, so do the Info Pages provided by Mailman. Please try the following search query on Google and you will see what I mean:
site:raystedman.org email
This is confusing for our users so I would like to hide all the pages containing "mailman" and "pipermail" from the SERP listings.
I read the archives of this list and found a couple of entries. The answer seems to be including "Disallows" in our robots.txt file. I did this as you can see here -- towards the bottom of the file:
http://www.raystedman.org/robots.txt
This has been in place for four weeks so I do not believe it is working. I believe this problem is there needs to be a robots.txt file for each subdomain. In this case, mailman lives in the lists.raystedman.org subdomain. Is there a way to have a robots.txt file for the lists.raystedman.org subdomain?
Thanks in advance for your assistance! Greg

On 12/26/2014 09:21 AM, Greg Sims wrote:
Absolutely. Just put it in the root directory that contains the page which is served when you go to <http://lists.raystedman.org/>. I.e., put it where it will be served if you go to <http://lists.raystedman.org/robots.txt>
However, note that can take a loooong time for pages to age out of search results.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro writes:
On 12/26/2014 09:21 AM, Greg Sims wrote:
In my case, I'm not even sure where that is (I think it's /var/www/way/over/there, see below, but there's some funky CGI script aliasing going on), and anyway it's different from where my fingers expect to find robots.txt. So I use
<VirtualHost>
ServerName tracker.xemacs.org
DocumentRoot /var/www/way/over/there
Alias /robots.txt /var/www/robots-tracker.xemacs.org
ScriptAlias ...
</VirtualHost>
where /var/www is the root of the "file system" I allow Apache to see. The order of Alias and ScriptAlias matters, IIRC.
Season to taste if your webserver isn't Apache.

Thanks Mark and Stephen,
This is not straightforward in my case either. I am running mailman under Plesk. There exists the file:
/etc/httpd/conf.d/mailman.conf
This files sets up a ScriptAlias for /mailman/ and an Alias for /pipermail/. I added the following to this file:
Alias /robots.txt /var/www/vhosts/raystedman.org/not-drupal/lists-robots.txt
This directory is a convenient location within the webroot of RayStedman.org. I created lists-robots.txt in the directory above and restarted Apache. The result is the existence of:
http://lists.raystedman.org/robots.txt
which hides all the files for the lists.raystedman.org subdomain from the search engines. I will now sit back a see how long it takes for these urls in this subdomain to be delisted. I will give them a push via google webmaster tools which offers an option to delist a URL.
Thanks again for your help guys!
Happy New Year, Greg
On Sat, Dec 27, 2014 at 11:03 AM, Mark Sapiro <mark@msapiro.net> wrote:

On 12/26/2014 09:21 AM, Greg Sims wrote:
Absolutely. Just put it in the root directory that contains the page which is served when you go to <http://lists.raystedman.org/>. I.e., put it where it will be served if you go to <http://lists.raystedman.org/robots.txt>
However, note that can take a loooong time for pages to age out of search results.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro writes:
On 12/26/2014 09:21 AM, Greg Sims wrote:
In my case, I'm not even sure where that is (I think it's /var/www/way/over/there, see below, but there's some funky CGI script aliasing going on), and anyway it's different from where my fingers expect to find robots.txt. So I use
<VirtualHost>
ServerName tracker.xemacs.org
DocumentRoot /var/www/way/over/there
Alias /robots.txt /var/www/robots-tracker.xemacs.org
ScriptAlias ...
</VirtualHost>
where /var/www is the root of the "file system" I allow Apache to see. The order of Alias and ScriptAlias matters, IIRC.
Season to taste if your webserver isn't Apache.

Thanks Mark and Stephen,
This is not straightforward in my case either. I am running mailman under Plesk. There exists the file:
/etc/httpd/conf.d/mailman.conf
This files sets up a ScriptAlias for /mailman/ and an Alias for /pipermail/. I added the following to this file:
Alias /robots.txt /var/www/vhosts/raystedman.org/not-drupal/lists-robots.txt
This directory is a convenient location within the webroot of RayStedman.org. I created lists-robots.txt in the directory above and restarted Apache. The result is the existence of:
http://lists.raystedman.org/robots.txt
which hides all the files for the lists.raystedman.org subdomain from the search engines. I will now sit back a see how long it takes for these urls in this subdomain to be delisted. I will give them a push via google webmaster tools which offers an option to delist a URL.
Thanks again for your help guys!
Happy New Year, Greg
On Sat, Dec 27, 2014 at 11:03 AM, Mark Sapiro <mark@msapiro.net> wrote:
participants (3)
-
Greg Sims
-
Mark Sapiro
-
Stephen J. Turnbull