[Python-checkins] r80092 - python/branches/py3k/Doc/library/urllib.request.rst
orsenthil at gmail.com
Mon Apr 19 10:22:23 CEST 2010
On Mon, Apr 19, 2010 at 10:49:38AM +0300, Ezio Melotti wrote:
> >+ b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> >+<?xml-stylesheet href="./css/ht2html'
> python.org doesn't use this doctype anymore (luckily), so this
> example (the and other later) should be updated.
Okay. I had not updated the existing example. I just updated that it
returns a byte b''. I see, its time to update the output. Shall do.
> >+Note that in Python 3, urlopen returns a bytes object by default. In many
> In real-world situations is not possible to just pick an encoding
> and use it to decode the result. The example should show how to read
> the encoding from the HTTP headers and possibly warn that the
> encoding might be missing or incorrect. The encoding can also be
> specified in other places, such as the XML declaration (for XHTML
> pages only) and in the <meta> tag (the headers have higher priority
> over XML declarations and meta tags).
> Since the next step after decoding is often parsing, it could also
> be mentioned that libraries to parse HTML are usually already able
> to decode the source automatically, so there's no need to search for
> the encoding and decode manually.
RDM suggested a different wording to replace the current note. I think
we can go ahead with that. The specifics of how to get the correct
encoding from HTTP headers, or encodings from other places like XML
declaration can be added as NOTE.
> >+>>> import urllib.request
> >+>>> f = urllib.request.urlopen('http://www.python.org/')
> >+>>> print(f.read(100).decode('utf-8')
> A ')' is missing here.
Oops, a typo. Thanks for the finding it.
> Why some examples have print() and others don't?
Perhaps it is not adding any value? But yes, a consistency can be
good. Shall look at the instances to achieve that.
More information about the Python-checkins