[Python-Dev] version numbers mismatched in google search results.

Vincent Davis vincent at vincentdavis.net
Sun Jan 26 04:27:22 CET 2014


I think subdomains need there own robots.txt which docs.python.org nor
docs.python.org/(2 or 3)/ have.
and http://python.org/robots.txt (below) seems a little sparse.
For sure /dev/ is not blocked


# Directions for robots.  See this URL:
# http://www.robotstxt.org/wc/norobots.html
# for a description of the file format.

User-agent: HTTrack
User-agent: puf
User-agent: MSIECrawler
Disallow: /

# The Krugle web crawler (though based on Nutch) is OK.
User-agent: Krugle
Allow: /
Disallow: /moin
Disallow: /pypi
Disallow: /~guido/orlijn/
Disallow: /wwwstats/
Disallow: /ftpstats/

# No one should be crawling us with Nutch.
User-agent: Nutch
Disallow: /

# Hide old versions of the documentation and various large sets of files.
User-agent: *
Disallow: /~guido/orlijn/
Disallow: /wwwstats/
Disallow: /webstats/
Disallow: /ftpstats/
Disallow: /moin
Disallow: /pypi
Disallow: /dev/buildbot/


Vincent Davis
720-301-3003


On Sat, Jan 25, 2014 at 9:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 26 January 2014 05:05, Benjamin Peterson <benjamin at python.org> wrote:
> >
> >
> > On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote:
> >> On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson
> >> <benjamin at python.org>wrote:
> >>
> >> > Internal links with no version redirect to the Python 2 version for
> >> > backwards compatibility reasons.
> >> >
> >>
> >> On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl at gmx.net>
> wrote:
> >>
> >> > Yep, and the URLs without version never served Python 3 docs as far
> as I
> >> > can
> >> >
> >> remember, so I don't know where Google has these <title>s from.
> >>
> >> That is not consistent with
> >> http://docs.python.org (no version number) redirects to
> >> http://docs.python.org/3/
> >
> > This is recent. It used to go to Python 2 docs.
>
> http://www.python.org/dev/peps/pep-0430/ covers the rationale for the
> current arrangement.
>
> The main issue is the extensive use of existing deep links into the
> Python 2 documentation from Python 2 specific tutorials and other
> references. Those third party references not only include vast numbers
> of online resources that we don't control, but also books that can't
> be updated at all.
>
> So, the canonical URLs on docs.python.org now always include the major
> version number in the path so they're unambiguous, the Python 3 docs
> are displayed by default, and unqualified deep links redirect to
> Python 2 for backwards compatibility.
>
> The robots.txt on python.org is *supposed* to keep the web crawlers
> away from the "/dev/" subtree (since most people searching for Python
> info aren't going to want the docs for an unreleased version), but I
> don't know if that's documented anywhere, or even if it's currently
> still configured that way.
>
> >> Maybe this is related to google search results.
> >> Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there
> was
> >> discussion about that.
> >
> > The internal links all used to go to Python 2.
>
> There's also a lot of weight given in Google to the extensive array of
> existing unqualified deep links, which relate to Python 2.
>
> >> I looked (googled) for an example of a google link to current version of
> >> python 3.3 documentation.  My approach was to google "python" and
> >> something
> >> listed in
> >> http://docs.python.org/3/whatsnew/3.3.html
> >> These results all seem to point to http://docs.python.org/dev/library
> >> i.e.
> >> 3.4.0b2
>
> Which suggests that the Google web crawler *is* spidering the dev
> docs, which we generally don't want :P
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140125/4aaf31d1/attachment.html>


More information about the Python-Dev mailing list