Re: [Python-Dev] version numbers mismatched in google search results.

On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote:
On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson <benjamin@python.org>wrote:
Internal links with no version redirect to the Python 2 version for backwards compatibility reasons.
On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl@gmx.net> wrote:
Yep, and the URLs without version never served Python 3 docs as far as I can
remember, so I don't know where Google has these <title>s from. That is not consistent with http://docs.python.org (no version number) redirects to http://docs.python.org/3/
This is recent. It used to go to Python 2 docs.
Maybe this is related to google search results. Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there was discussion about that.
The internal links all used to go to Python 2.
I looked (googled) for an example of a google link to current version of python 3.3 documentation. My approach was to google "python" and something listed in http://docs.python.org/3/whatsnew/3.3.html These results all seem to point to http://docs.python.org/dev/library i.e. 3.4.0b2
Vincent Davis

On 26 January 2014 05:05, Benjamin Peterson <benjamin@python.org> wrote:
On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote:
On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson <benjamin@python.org>wrote:
Internal links with no version redirect to the Python 2 version for backwards compatibility reasons.
On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl@gmx.net> wrote:
Yep, and the URLs without version never served Python 3 docs as far as I can
remember, so I don't know where Google has these <title>s from.
That is not consistent with http://docs.python.org (no version number) redirects to http://docs.python.org/3/
This is recent. It used to go to Python 2 docs.
http://www.python.org/dev/peps/pep-0430/ covers the rationale for the current arrangement. The main issue is the extensive use of existing deep links into the Python 2 documentation from Python 2 specific tutorials and other references. Those third party references not only include vast numbers of online resources that we don't control, but also books that can't be updated at all. So, the canonical URLs on docs.python.org now always include the major version number in the path so they're unambiguous, the Python 3 docs are displayed by default, and unqualified deep links redirect to Python 2 for backwards compatibility. The robots.txt on python.org is *supposed* to keep the web crawlers away from the "/dev/" subtree (since most people searching for Python info aren't going to want the docs for an unreleased version), but I don't know if that's documented anywhere, or even if it's currently still configured that way.
Maybe this is related to google search results. Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there was discussion about that.
The internal links all used to go to Python 2.
There's also a lot of weight given in Google to the extensive array of existing unqualified deep links, which relate to Python 2.
I looked (googled) for an example of a google link to current version of python 3.3 documentation. My approach was to google "python" and something listed in http://docs.python.org/3/whatsnew/3.3.html These results all seem to point to http://docs.python.org/dev/library i.e. 3.4.0b2
Which suggests that the Google web crawler *is* spidering the dev docs, which we generally don't want :P Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I think subdomains need there own robots.txt which docs.python.org nor docs.python.org/(2 or 3)/ have. and http://python.org/robots.txt (below) seems a little sparse. For sure /dev/ is not blocked # Directions for robots. See this URL: # http://www.robotstxt.org/wc/norobots.html # for a description of the file format. User-agent: HTTrack User-agent: puf User-agent: MSIECrawler Disallow: / # The Krugle web crawler (though based on Nutch) is OK. User-agent: Krugle Allow: / Disallow: /moin Disallow: /pypi Disallow: /~guido/orlijn/ Disallow: /wwwstats/ Disallow: /ftpstats/ # No one should be crawling us with Nutch. User-agent: Nutch Disallow: / # Hide old versions of the documentation and various large sets of files. User-agent: * Disallow: /~guido/orlijn/ Disallow: /wwwstats/ Disallow: /webstats/ Disallow: /ftpstats/ Disallow: /moin Disallow: /pypi Disallow: /dev/buildbot/ Vincent Davis 720-301-3003 On Sat, Jan 25, 2014 at 9:04 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 26 January 2014 05:05, Benjamin Peterson <benjamin@python.org> wrote:
On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote:
On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson <benjamin@python.org>wrote:
Internal links with no version redirect to the Python 2 version for backwards compatibility reasons.
On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl@gmx.net>
wrote:
Yep, and the URLs without version never served Python 3 docs as far
as I
can
remember, so I don't know where Google has these <title>s from.
That is not consistent with http://docs.python.org (no version number) redirects to http://docs.python.org/3/
This is recent. It used to go to Python 2 docs.
http://www.python.org/dev/peps/pep-0430/ covers the rationale for the current arrangement.
The main issue is the extensive use of existing deep links into the Python 2 documentation from Python 2 specific tutorials and other references. Those third party references not only include vast numbers of online resources that we don't control, but also books that can't be updated at all.
So, the canonical URLs on docs.python.org now always include the major version number in the path so they're unambiguous, the Python 3 docs are displayed by default, and unqualified deep links redirect to Python 2 for backwards compatibility.
The robots.txt on python.org is *supposed* to keep the web crawlers away from the "/dev/" subtree (since most people searching for Python info aren't going to want the docs for an unreleased version), but I don't know if that's documented anywhere, or even if it's currently still configured that way.
Maybe this is related to google search results. Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there was discussion about that.
The internal links all used to go to Python 2.
There's also a lot of weight given in Google to the extensive array of existing unqualified deep links, which relate to Python 2.
I looked (googled) for an example of a google link to current version of python 3.3 documentation. My approach was to google "python" and something listed in http://docs.python.org/3/whatsnew/3.3.html These results all seem to point to http://docs.python.org/dev/library i.e. 3.4.0b2
Which suggests that the Google web crawler *is* spidering the dev docs, which we generally don't want :P
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I also get search results with Python 1.5.0p2 showing up. Search for PyArg_ParseTuple. The first result is a URL with /2/ in it who's search result title says "3.3.3" but opening it is the correct 2.x documentation. The second result is the ancient Python 1.5.0 docs. ;) Should the ancient /release/ docs have redirects setup or be somehow marked as no crawl? http://docs.python.org/release/1.5.2p2/ext/parseTuple.html is the humorous result in this case. "I want to know how the API I'm using behaved 15 years ago!", said no one ever. -gps On Sat, Jan 25, 2014 at 9:34 PM, Benjamin Peterson <benjamin@python.org>wrote:
On Sat, Jan 25, 2014, at 07:04 PM, Nick Coghlan wrote:
Which suggests that the Google web crawler *is* spidering the dev docs, which we generally don't want :P
I've now added a robots.txt to disallow crawling /dev. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org

On Thu, Jan 30, 2014, at 10:11 AM, Gregory P. Smith wrote:
I also get search results with Python 1.5.0p2 showing up.
Search for PyArg_ParseTuple. The first result is a URL with /2/ in it who's search result title says "3.3.3" but opening it is the correct 2.x documentation. The second result is the ancient Python 1.5.0 docs. ;)
Should the ancient /release/ docs have redirects setup or be somehow marked as no crawl? http://docs.python.org/release/1.5.2p2/ext/parseTuple.html is the humorous result in this case.
I've now added /release to robots.txt.
participants (4)
-
Benjamin Peterson
-
Gregory P. Smith
-
Nick Coghlan
-
Vincent Davis