I think subdomains need there own robots.txt which docs.python.org nor docs.python.org/(2 or 3)/ have.
and http://python.org/robots.txt (below) seems a little sparse.
For sure /dev/ is not blocked


# Directions for robots.  See this URL:
# http://www.robotstxt.org/wc/norobots.html
# for a description of the file format.

User-agent: HTTrack
User-agent: puf
User-agent: MSIECrawler
Disallow: /

# The Krugle web crawler (though based on Nutch) is OK.
User-agent: Krugle
Allow: /
Disallow: /moin
Disallow: /pypi
Disallow: /~guido/orlijn/
Disallow: /wwwstats/
Disallow: /ftpstats/

# No one should be crawling us with Nutch.
User-agent: Nutch
Disallow: /

# Hide old versions of the documentation and various large sets of files.
User-agent: *
Disallow: /~guido/orlijn/
Disallow: /wwwstats/
Disallow: /webstats/
Disallow: /ftpstats/
Disallow: /moin
Disallow: /pypi
Disallow: /dev/buildbot/

Vincent Davis
720-301-3003


On Sat, Jan 25, 2014 at 9:04 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 26 January 2014 05:05, Benjamin Peterson <benjamin@python.org> wrote:
>
>
> On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote:
>> On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson
>> <benjamin@python.org>wrote:
>>
>> > Internal links with no version redirect to the Python 2 version for
>> > backwards compatibility reasons.
>> >
>>
>> On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl@gmx.net> wrote:
>>
>> > Yep, and the URLs without version never served Python 3 docs as far as I
>> > can
>> >
>> remember, so I don't know where Google has these <title>s from.
>>
>> That is not consistent with
>> http://docs.python.org (no version number) redirects to
>> http://docs.python.org/3/
>
> This is recent. It used to go to Python 2 docs.

http://www.python.org/dev/peps/pep-0430/ covers the rationale for the
current arrangement.

The main issue is the extensive use of existing deep links into the
Python 2 documentation from Python 2 specific tutorials and other
references. Those third party references not only include vast numbers
of online resources that we don't control, but also books that can't
be updated at all.

So, the canonical URLs on docs.python.org now always include the major
version number in the path so they're unambiguous, the Python 3 docs
are displayed by default, and unqualified deep links redirect to
Python 2 for backwards compatibility.

The robots.txt on python.org is *supposed* to keep the web crawlers
away from the "/dev/" subtree (since most people searching for Python
info aren't going to want the docs for an unreleased version), but I
don't know if that's documented anywhere, or even if it's currently
still configured that way.

>> Maybe this is related to google search results.
>> Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there was
>> discussion about that.
>
> The internal links all used to go to Python 2.

There's also a lot of weight given in Google to the extensive array of
existing unqualified deep links, which relate to Python 2.

>> I looked (googled) for an example of a google link to current version of
>> python 3.3 documentation.  My approach was to google "python" and
>> something
>> listed in
>> http://docs.python.org/3/whatsnew/3.3.html
>> These results all seem to point to http://docs.python.org/dev/library
>> i.e.
>> 3.4.0b2

Which suggests that the Google web crawler *is* spidering the dev
docs, which we generally don't want :P

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia