[Python-Dev] version numbers mismatched in google search results.
vincent at vincentdavis.net
Sun Jan 26 04:27:22 CET 2014
I think subdomains need there own robots.txt which docs.python.org nor
docs.python.org/(2 or 3)/ have.
and http://python.org/robots.txt (below) seems a little sparse.
For sure /dev/ is not blocked
# Directions for robots. See this URL:
# for a description of the file format.
# The Krugle web crawler (though based on Nutch) is OK.
# No one should be crawling us with Nutch.
# Hide old versions of the documentation and various large sets of files.
On Sat, Jan 25, 2014 at 9:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 26 January 2014 05:05, Benjamin Peterson <benjamin at python.org> wrote:
> > On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote:
> >> On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson
> >> <benjamin at python.org>wrote:
> >> > Internal links with no version redirect to the Python 2 version for
> >> > backwards compatibility reasons.
> >> >
> >> On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl at gmx.net>
> >> > Yep, and the URLs without version never served Python 3 docs as far
> as I
> >> > can
> >> >
> >> remember, so I don't know where Google has these <title>s from.
> >> That is not consistent with
> >> http://docs.python.org (no version number) redirects to
> >> http://docs.python.org/3/
> > This is recent. It used to go to Python 2 docs.
> http://www.python.org/dev/peps/pep-0430/ covers the rationale for the
> current arrangement.
> The main issue is the extensive use of existing deep links into the
> Python 2 documentation from Python 2 specific tutorials and other
> references. Those third party references not only include vast numbers
> of online resources that we don't control, but also books that can't
> be updated at all.
> So, the canonical URLs on docs.python.org now always include the major
> version number in the path so they're unambiguous, the Python 3 docs
> are displayed by default, and unqualified deep links redirect to
> Python 2 for backwards compatibility.
> The robots.txt on python.org is *supposed* to keep the web crawlers
> away from the "/dev/" subtree (since most people searching for Python
> info aren't going to want the docs for an unreleased version), but I
> don't know if that's documented anywhere, or even if it's currently
> still configured that way.
> >> Maybe this is related to google search results.
> >> Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there
> >> discussion about that.
> > The internal links all used to go to Python 2.
> There's also a lot of weight given in Google to the extensive array of
> existing unqualified deep links, which relate to Python 2.
> >> I looked (googled) for an example of a google link to current version of
> >> python 3.3 documentation. My approach was to google "python" and
> >> something
> >> listed in
> >> http://docs.python.org/3/whatsnew/3.3.html
> >> These results all seem to point to http://docs.python.org/dev/library
> >> i.e.
> >> 3.4.0b2
> Which suggests that the Google web crawler *is* spidering the dev
> docs, which we generally don't want :P
> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev