[Python-checkins] r80092 - python/branches/py3k/Doc/library/urllib.request.rst
R. David Murray
rdmurray at bitdance.com
Sat Apr 17 18:05:00 CEST 2010
On Thu, 15 Apr 2010 19:18:22 +0200, senthil.kumaran <python-checkins at python.org> wrote:
> --- python/branches/py3k/Doc/library/urllib.request.rst (original)
> +++ python/branches/py3k/Doc/library/urllib.request.rst Thu Apr 15 19:18:22 2010
> @@ -1073,23 +1073,39 @@
> This example gets the python.org main page and displays the first 100 bytes of
> >>> import urllib.request
> >>> f = urllib.request.urlopen('http://www.python.org/')
> >>> print(f.read(100))
> + b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> + <?xml-stylesheet href="./css/ht2html'
> +Note that in Python 3, urlopen returns a bytes object by default. In many
> +circumstances, you might expect the output of urlopen to be a string. This
> +might be a carried over expectation from Python 2, where urlopen returned
> +string or it might even the common usecase. In those cases, you should
> +explicitly decode the bytes to string.
Senthil, I think that we are in general considering Python 3 a "clean
start", and avoiding mentioning how things were done in Python 2 except
where it is important for compatibility (eg: pickle). I think the
mention of how Python 2 did it actually muddies the explanation of how
one should do it. I would either drop the mention of Python 2, or
move it to a footnote (I favor just dropping it).
How about this:
Note that urlopen returns a bytes object. This is because there is no way
for urlopen to automatically determine the encoding of the byte stream
it receives from the http sever. In general, a program will decode
the returned bytes object to string once it determines or guesses
the appropriate encoding.
Aside: I was curious how one went about determining the encoding, and
found this fascinating document that seems to show just now non-trivial
doing so is:
And I thought email was a pain to parse. Little did I know.
R. David Murray www.bitdance.com
More information about the Python-checkins