[Python-checkins] r80092 - python/branches/py3k/Doc/library/urllib.request.rst

R. David Murray rdmurray at bitdance.com
Sat Apr 17 18:05:00 CEST 2010

On Thu, 15 Apr 2010 19:18:22 +0200, senthil.kumaran <python-checkins at python.org> wrote:
> ==============================================================================
> --- python/branches/py3k/Doc/library/urllib.request.rst	(original)
> +++ python/branches/py3k/Doc/library/urllib.request.rst	Thu Apr 15 19:18:22 2010
> @@ -1073,23 +1073,39 @@
>  --------
>  This example gets the python.org main page and displays the first 100 bytes of
> -it::
> +it.::
>     >>> import urllib.request
>     >>> f = urllib.request.urlopen('http://www.python.org/')
>     >>> print(f.read(100))
> +   b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> +   <?xml-stylesheet href="./css/ht2html'
> +
> +Note that in Python 3, urlopen returns a bytes object by default. In many
> +circumstances, you might expect the output of urlopen to be a string. This
> +might be a carried over expectation from Python 2, where urlopen returned
> +string or it might even the common usecase. In those cases, you should
> +explicitly decode the bytes to string.

Senthil, I think that we are in general considering Python 3 a "clean
start", and avoiding mentioning how things were done in Python 2 except
where it is important for compatibility (eg: pickle).  I think the
mention of how Python 2 did it actually muddies the explanation of how
one should do it.  I would either drop the mention of Python 2, or
move it to a footnote (I favor just dropping it).

How about this:

Note that urlopen returns a bytes object.  This is because there is no way
for urlopen to automatically determine the encoding of the byte stream
it receives from the http sever.  In general, a program will decode
the returned bytes object to string once it determines or guesses
the appropriate encoding.

Aside: I was curious how one went about determining the encoding, and
found this fascinating document that seems to show just now non-trivial
doing so is:


And I thought email was a pain to parse.  Little did I know.

R. David Murray                                      www.bitdance.com

More information about the Python-checkins mailing list