[Python-3000-checkins] r64604 - in python/branches/py3k-urllib/Doc: howto/urllib2.rst library/contextlib.rst library/ftplib.rst library/http.client.rst library/internet.rst

senthil.kumaran python-3000-checkins at python.org
Tue Jul 1 05:40:57 CEST 2008


Author: senthil.kumaran
Date: Tue Jul  1 05:40:56 2008
New Revision: 64604

Log:
updating the changes to py3k-urllib branch

Modified:
   python/branches/py3k-urllib/Doc/howto/urllib2.rst
   python/branches/py3k-urllib/Doc/library/contextlib.rst
   python/branches/py3k-urllib/Doc/library/ftplib.rst
   python/branches/py3k-urllib/Doc/library/http.client.rst
   python/branches/py3k-urllib/Doc/library/internet.rst

Modified: python/branches/py3k-urllib/Doc/howto/urllib2.rst
==============================================================================
--- python/branches/py3k-urllib/Doc/howto/urllib2.rst	(original)
+++ python/branches/py3k-urllib/Doc/howto/urllib2.rst	Tue Jul  1 05:40:56 2008
@@ -1,6 +1,6 @@
-************************************************
-  HOWTO Fetch Internet Resources Using urllib2
-************************************************
+*****************************************************
+  HOWTO Fetch Internet Resources Using urllib package
+*****************************************************
 
 :Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
 
@@ -24,14 +24,14 @@
     
         A tutorial on *Basic Authentication*, with examples in Python.
 
-**urllib2** is a `Python <http://www.python.org>`_ module for fetching URLs
+**urllib.request** is a `Python <http://www.python.org>`_ module for fetching URLs
 (Uniform Resource Locators). It offers a very simple interface, in the form of
 the *urlopen* function. This is capable of fetching URLs using a variety of
 different protocols. It also offers a slightly more complex interface for
 handling common situations - like basic authentication, cookies, proxies and so
 on. These are provided by objects called handlers and openers.
 
-urllib2 supports fetching URLs for many "URL schemes" (identified by the string
+urllib.request supports fetching URLs for many "URL schemes" (identified by the string
 before the ":" in URL - for example "ftp" is the URL scheme of
 "ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP).
 This tutorial focuses on the most common case, HTTP.
@@ -40,43 +40,43 @@
 encounter errors or non-trivial cases when opening HTTP URLs, you will need some
 understanding of the HyperText Transfer Protocol. The most comprehensive and
 authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
-not intended to be easy to read. This HOWTO aims to illustrate using *urllib2*,
+not intended to be easy to read. This HOWTO aims to illustrate using *urllib*,
 with enough detail about HTTP to help you through. It is not intended to replace
-the :mod:`urllib2` docs, but is supplementary to them.
+the :mod:`urllib.request` docs, but is supplementary to them.
 
 
 Fetching URLs
 =============
 
-The simplest way to use urllib2 is as follows::
+The simplest way to use urllib.request is as follows::
 
-    import urllib2
-    response = urllib2.urlopen('http://python.org/')
+    import urllib.request
+    response = urllib.request.urlopen('http://python.org/')
     html = response.read()
 
-Many uses of urllib2 will be that simple (note that instead of an 'http:' URL we
+Many uses of urllib will be that simple (note that instead of an 'http:' URL we
 could have used an URL starting with 'ftp:', 'file:', etc.).  However, it's the
 purpose of this tutorial to explain the more complicated cases, concentrating on
 HTTP.
 
 HTTP is based on requests and responses - the client makes requests and servers
-send responses. urllib2 mirrors this with a ``Request`` object which represents
+send responses. urllib.request mirrors this with a ``Request`` object which represents
 the HTTP request you are making. In its simplest form you create a Request
 object that specifies the URL you want to fetch. Calling ``urlopen`` with this
 Request object returns a response object for the URL requested. This response is
 a file-like object, which means you can for example call ``.read()`` on the
 response::
 
-    import urllib2
+    import urllib.request
 
-    req = urllib2.Request('http://www.voidspace.org.uk')
-    response = urllib2.urlopen(req)
+    req = urllib.request.Request('http://www.voidspace.org.uk')
+    response = urllib.request.urlopen(req)
     the_page = response.read()
 
-Note that urllib2 makes use of the same Request interface to handle all URL
+Note that urllib.request makes use of the same Request interface to handle all URL
 schemes.  For example, you can make an FTP request like so::
 
-    req = urllib2.Request('ftp://example.com/')
+    req = urllib.request.Request('ftp://example.com/')
 
 In the case of HTTP, there are two extra things that Request objects allow you
 to do: First, you can pass data to be sent to the server.  Second, you can pass
@@ -94,20 +94,20 @@
 all POSTs have to come from forms: you can use a POST to transmit arbitrary data
 to your own application. In the common case of HTML forms, the data needs to be
 encoded in a standard way, and then passed to the Request object as the ``data``
-argument. The encoding is done using a function from the ``urllib`` library
-*not* from ``urllib2``. ::
+argument. The encoding is done using a function from the ``urllib.parse`` library
+*not* from ``urllib.request``. ::
 
-    import urllib
-    import urllib2  
+    import urllib.parse
+    import urllib.request 
 
     url = 'http://www.someserver.com/cgi-bin/register.cgi'
     values = {'name' : 'Michael Foord',
               'location' : 'Northampton',
               'language' : 'Python' }
 
-    data = urllib.urlencode(values)
-    req = urllib2.Request(url, data)
-    response = urllib2.urlopen(req)
+    data = urllib.parse.urlencode(values)
+    req = urllib.request.Request(url, data)
+    response = urllib.request.urlopen(req)
     the_page = response.read()
 
 Note that other encodings are sometimes required (e.g. for file upload from HTML
@@ -115,7 +115,7 @@
 <http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
 details).
 
-If you do not pass the ``data`` argument, urllib2 uses a **GET** request. One
+If you do not pass the ``data`` argument, urllib.request uses a **GET** request. One
 way in which GET and POST requests differ is that POST requests often have
 "side-effects": they change the state of the system in some way (for example by
 placing an order with the website for a hundredweight of tinned spam to be
@@ -127,18 +127,18 @@
 
 This is done as follows::
 
-    >>> import urllib2
-    >>> import urllib
+    >>> import urllib.request
+    >>> import urllib.parse
     >>> data = {}
     >>> data['name'] = 'Somebody Here'
     >>> data['location'] = 'Northampton'
     >>> data['language'] = 'Python'
-    >>> url_values = urllib.urlencode(data)
+    >>> url_values = urllib.parse.urlencode(data)
     >>> print(url_values)
     name=Somebody+Here&language=Python&location=Northampton
     >>> url = 'http://www.example.com/example.cgi'
     >>> full_url = url + '?' + url_values
-    >>> data = urllib2.open(full_url)
+    >>> data = urllib.request.open(full_url)
 
 Notice that the full URL is created by adding a ``?`` to the URL, followed by
 the encoded values.
@@ -150,7 +150,7 @@
 to your HTTP request.
 
 Some websites [#]_ dislike being browsed by programs, or send different versions
-to different browsers [#]_ . By default urllib2 identifies itself as
+to different browsers [#]_ . By default urllib identifies itself as
 ``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
 numbers of the Python release,
 e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
@@ -160,8 +160,8 @@
 request as above, but identifies itself as a version of Internet
 Explorer [#]_. ::
 
-    import urllib
-    import urllib2  
+    import urllib.parse
+    import urllib.request 
     
     url = 'http://www.someserver.com/cgi-bin/register.cgi'
     user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
@@ -170,9 +170,9 @@
               'language' : 'Python' }
     headers = { 'User-Agent' : user_agent }
     
-    data = urllib.urlencode(values)
-    req = urllib2.Request(url, data, headers)
-    response = urllib2.urlopen(req)
+    data = urllib.parse.urlencode(values)
+    req = urllib.request.Request(url, data, headers)
+    response = urllib.request.urlopen(req)
     the_page = response.read()
 
 The response also has two useful methods. See the section on `info and geturl`_
@@ -182,7 +182,7 @@
 Handling Exceptions
 ===================
 
-*urlopen* raises ``URLError`` when it cannot handle a response (though as usual
+*urllib.error* raises ``URLError`` when it cannot handle a response (though as usual
 with Python APIs, builtin exceptions such as ValueError, TypeError etc. may also
 be raised).
 
@@ -199,9 +199,9 @@
 
 e.g. ::
 
-    >>> req = urllib2.Request('http://www.pretend_server.org')
-    >>> try: urllib2.urlopen(req)
-    >>> except URLError, e:
+    >>> req = urllib.request.Request('http://www.pretend_server.org')
+    >>> try: urllib.request.urlopen(req)
+    >>> except urllib.error.URLError, e:
     >>>    print(e.reason)
     >>>
     (4, 'getaddrinfo failed')
@@ -214,7 +214,7 @@
 the status code indicates that the server is unable to fulfil the request. The
 default handlers will handle some of these responses for you (for example, if
 the response is a "redirection" that requests the client fetch the document from
-a different URL, urllib2 will handle that for you). For those it can't handle,
+a different URL, urllib.request will handle that for you). For those it can't handle,
 urlopen will raise an ``HTTPError``. Typical errors include '404' (page not
 found), '403' (request forbidden), and '401' (authentication required).
 
@@ -305,12 +305,12 @@
 When an error is raised the server responds by returning an HTTP error code
 *and* an error page. You can use the ``HTTPError`` instance as a response on the
 page returned. This means that as well as the code attribute, it also has read,
-geturl, and info, methods. ::
+geturl, and info, methods as returned by the ``urllib.response`` module::
 
-    >>> req = urllib2.Request('http://www.python.org/fish.html')
+    >>> req = urllib.request.Request('http://www.python.org/fish.html')
     >>> try: 
-    >>>     urllib2.urlopen(req)
-    >>> except URLError, e:
+    >>>     urllib.request.urlopen(req)
+    >>> except urllib.error.URLError, e:
     >>>     print(e.code)
     >>>     print(e.read())
     >>> 
@@ -334,7 +334,8 @@
 ::
 
 
-    from urllib2 import Request, urlopen, URLError, HTTPError
+    from urllib.request import Request, urlopen
+    from urllib.error import URLError, HTTPError
     req = Request(someurl)
     try:
         response = urlopen(req)
@@ -358,7 +359,8 @@
 
 ::
 
-    from urllib2 import Request, urlopen, URLError
+    from urllib.request import Request, urlopen
+    from urllib.error import  URLError
     req = Request(someurl)
     try:
         response = urlopen(req)
@@ -377,7 +379,8 @@
 ===============
 
 The response returned by urlopen (or the ``HTTPError`` instance) has two useful
-methods ``info`` and ``geturl``.
+methods ``info`` and ``geturl`` and is defined in the module
+``urllib.response``.
 
 **geturl** - this returns the real URL of the page fetched. This is useful
 because ``urlopen`` (or the opener object used) may have followed a
@@ -397,7 +400,7 @@
 ====================
 
 When you fetch a URL you use an opener (an instance of the perhaps
-confusingly-named :class:`urllib2.OpenerDirector`). Normally we have been using
+confusingly-named :class:`urllib.request.OpenerDirector`). Normally we have been using
 the default opener - via ``urlopen`` - but you can create custom
 openers. Openers use handlers. All the "heavy lifting" is done by the
 handlers. Each handler knows how to open URLs for a particular URL scheme (http,
@@ -466,24 +469,24 @@
 than the URL you pass to .add_password() will also match. ::
 
     # create a password manager
-    password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()                        
+    password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()                        
 
     # Add the username and password.
     # If we knew the realm, we could use it instead of ``None``.
     top_level_url = "http://example.com/foo/"
     password_mgr.add_password(None, top_level_url, username, password)
 
-    handler = urllib2.HTTPBasicAuthHandler(password_mgr)                            
+    handler = urllib.request.HTTPBasicAuthHandler(password_mgr)                            
 
     # create "opener" (OpenerDirector instance)
-    opener = urllib2.build_opener(handler)                       
+    opener = urllib.request.build_opener(handler)                       
 
     # use the opener to fetch a URL
     opener.open(a_url)      
 
     # Install the opener.
-    # Now all calls to urllib2.urlopen use our opener.
-    urllib2.install_opener(opener)                               
+    # Now all calls to urllib.request.urlopen use our opener.
+    urllib.request.install_opener(opener)                               
 
 .. note::
 
@@ -505,46 +508,46 @@
 Proxies
 =======
 
-**urllib2** will auto-detect your proxy settings and use those. This is through
+**urllib.request** will auto-detect your proxy settings and use those. This is through
 the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
 a good thing, but there are occasions when it may not be helpful [#]_. One way
 to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
 is done using similar steps to setting up a `Basic Authentication`_ handler : ::
 
-    >>> proxy_support = urllib2.ProxyHandler({})
-    >>> opener = urllib2.build_opener(proxy_support)
-    >>> urllib2.install_opener(opener)
+    >>> proxy_support = urllib.request.ProxyHandler({})
+    >>> opener = urllib.request.build_opener(proxy_support)
+    >>> urllib.request.install_opener(opener)
 
 .. note::
 
-    Currently ``urllib2`` *does not* support fetching of ``https`` locations
-    through a proxy.  However, this can be enabled by extending urllib2 as
+    Currently ``urllib.request`` *does not* support fetching of ``https`` locations
+    through a proxy.  However, this can be enabled by extending urllib.request as
     shown in the recipe [#]_.
 
 
 Sockets and Layers
 ==================
 
-The Python support for fetching resources from the web is layered. urllib2 uses
-the http.client library, which in turn uses the socket library.
+The Python support for fetching resources from the web is layered.
+urllib.request uses the http.client library, which in turn uses the socket library.
 
 As of Python 2.3 you can specify how long a socket should wait for a response
 before timing out. This can be useful in applications which have to fetch web
 pages. By default the socket module has *no timeout* and can hang. Currently,
-the socket timeout is not exposed at the http.client or urllib2 levels.
+the socket timeout is not exposed at the http.client or urllib.request levels.
 However, you can set the default timeout globally for all sockets using ::
 
     import socket
-    import urllib2
+    import urllib.request
 
     # timeout in seconds
     timeout = 10
     socket.setdefaulttimeout(timeout) 
 
-    # this call to urllib2.urlopen now uses the default timeout
+    # this call to urllib.request.urlopen now uses the default timeout
     # we have set in the socket module
-    req = urllib2.Request('http://www.voidspace.org.uk')
-    response = urllib2.urlopen(req)
+    req = urllib.request.Request('http://www.voidspace.org.uk')
+    response = urllib.request.urlopen(req)
 
 
 -------

Modified: python/branches/py3k-urllib/Doc/library/contextlib.rst
==============================================================================
--- python/branches/py3k-urllib/Doc/library/contextlib.rst	(original)
+++ python/branches/py3k-urllib/Doc/library/contextlib.rst	Tue Jul  1 05:40:56 2008
@@ -98,9 +98,9 @@
    And lets you write code like this::
 
       from contextlib import closing
-      import urllib
+      import urllib.request
 
-      with closing(urllib.urlopen('http://www.python.org')) as page:
+      with closing(urllib.request.urlopen('http://www.python.org')) as page:
           for line in page:
               print(line)
 

Modified: python/branches/py3k-urllib/Doc/library/ftplib.rst
==============================================================================
--- python/branches/py3k-urllib/Doc/library/ftplib.rst	(original)
+++ python/branches/py3k-urllib/Doc/library/ftplib.rst	Tue Jul  1 05:40:56 2008
@@ -13,9 +13,9 @@
 This module defines the class :class:`FTP` and a few related items. The
 :class:`FTP` class implements the client side of the FTP protocol.  You can use
 this to write Python programs that perform a variety of automated FTP jobs, such
-as mirroring other ftp servers.  It is also used by the module :mod:`urllib` to
-handle URLs that use FTP.  For more information on FTP (File Transfer Protocol),
-see Internet :rfc:`959`.
+as mirroring other ftp servers.  It is also used by the module
+:mod:`urllib.request` to handle URLs that use FTP.  For more information on FTP
+(File Transfer Protocol), see Internet :rfc:`959`.
 
 Here's a sample session using the :mod:`ftplib` module::
 

Modified: python/branches/py3k-urllib/Doc/library/http.client.rst
==============================================================================
--- python/branches/py3k-urllib/Doc/library/http.client.rst	(original)
+++ python/branches/py3k-urllib/Doc/library/http.client.rst	Tue Jul  1 05:40:56 2008
@@ -9,10 +9,11 @@
    pair: HTTP; protocol
    single: HTTP; http.client (standard module)
 
-.. index:: module: urllib
+.. index:: module: urllib.request
 
 This module defines classes which implement the client side of the HTTP and
-HTTPS protocols.  It is normally not used directly --- the module :mod:`urllib`
+HTTPS protocols.  It is normally not used directly --- the module
+:mod:`urllib.request`
 uses it to handle URLs that use HTTP and HTTPS.
 
 .. note::
@@ -484,8 +485,8 @@
 
 Here is an example session that shows how to ``POST`` requests::
 
-   >>> import http.client, urllib
-   >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
+   >>> import http.client, urllib.parse
+   >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
    >>> headers = {"Content-type": "application/x-www-form-urlencoded",
    ...            "Accept": "text/plain"}
    >>> conn = http.client.HTTPConnection("musi-cal.mojam.com:80")

Modified: python/branches/py3k-urllib/Doc/library/internet.rst
==============================================================================
--- python/branches/py3k-urllib/Doc/library/internet.rst	(original)
+++ python/branches/py3k-urllib/Doc/library/internet.rst	Tue Jul  1 05:40:56 2008
@@ -25,9 +25,9 @@
    cgitb.rst
    wsgiref.rst
    urllib.request.rst
-   urllib.robotparser.rst
-   urllib.error.rst
    urllib.parse.rst
+   urllib.error.rst
+   urllib.robotparser.rst
    http.client.rst
    ftplib.rst
    poplib.rst


More information about the Python-3000-checkins mailing list