[Python-checkins] r56756 - sandbox/trunk/urilib/libcgi.tex sandbox/trunk/urilib/liburllib2.tex sandbox/trunk/urilib/liburlparse.tex sandbox/trunk/urilib/test_urllib2.py sandbox/trunk/urilib/urllib2.py
senthil.kumaran
python-checkins at python.org
Sun Aug 5 23:42:34 CEST 2007
Author: senthil.kumaran
Date: Sun Aug 5 23:42:33 2007
New Revision: 56756
Added:
sandbox/trunk/urilib/libcgi.tex (contents, props changed)
sandbox/trunk/urilib/liburllib2.tex (contents, props changed)
sandbox/trunk/urilib/liburlparse.tex (contents, props changed)
sandbox/trunk/urilib/test_urllib2.py (contents, props changed)
sandbox/trunk/urilib/urllib2.py (contents, props changed)
Log:
SoC Tasks update. Added Docs, urllib2 cache redirection
Added: sandbox/trunk/urilib/libcgi.tex
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/libcgi.tex Sun Aug 5 23:42:33 2007
@@ -0,0 +1,609 @@
+\section{\module{cgi} ---
+ Common Gateway Interface support.}
+\declaremodule{standard}{cgi}
+
+\modulesynopsis{Common Gateway Interface support, used to interpret
+forms in server-side scripts.}
+
+\indexii{WWW}{server}
+\indexii{CGI}{protocol}
+\indexii{HTTP}{protocol}
+\indexii{MIME}{headers}
+\index{URL}
+
+
+Support module for Common Gateway Interface (CGI) scripts.%
+\index{Common Gateway Interface}
+
+This module defines a number of utilities for use by CGI scripts
+written in Python.
+
+\subsection{Introduction}
+\nodename{cgi-intro}
+
+A CGI script is invoked by an HTTP server, usually to process user
+input submitted through an HTML \code{<FORM>} or \code{<ISINDEX>} element.
+
+Most often, CGI scripts live in the server's special \file{cgi-bin}
+directory. The HTTP server places all sorts of information about the
+request (such as the client's hostname, the requested URL, the query
+string, and lots of other goodies) in the script's shell environment,
+executes the script, and sends the script's output back to the client.
+
+The script's input is connected to the client too, and sometimes the
+form data is read this way; at other times the form data is passed via
+the ``query string'' part of the URL. This module is intended
+to take care of the different cases and provide a simpler interface to
+the Python script. It also provides a number of utilities that help
+in debugging scripts, and the latest addition is support for file
+uploads from a form (if your browser supports it).
+
+The output of a CGI script should consist of two sections, separated
+by a blank line. The first section contains a number of headers,
+telling the client what kind of data is following. Python code to
+generate a minimal header section looks like this:
+
+\begin{verbatim}
+print "Content-Type: text/html" # HTML is following
+print # blank line, end of headers
+\end{verbatim}
+
+The second section is usually HTML, which allows the client software
+to display nicely formatted text with header, in-line images, etc.
+Here's Python code that prints a simple piece of HTML:
+
+\begin{verbatim}
+print "<TITLE>CGI script output</TITLE>"
+print "<H1>This is my first CGI script</H1>"
+print "Hello, world!"
+\end{verbatim}
+
+\subsection{Using the cgi module}
+\nodename{Using the cgi module}
+
+Begin by writing \samp{import cgi}. Do not use \samp{from cgi import
+*} --- the module defines all sorts of names for its own use or for
+backward compatibility that you don't want in your namespace.
+
+When you write a new script, consider adding the line:
+
+\begin{verbatim}
+import cgitb; cgitb.enable()
+\end{verbatim}
+
+This activates a special exception handler that will display detailed
+reports in the Web browser if any errors occur. If you'd rather not
+show the guts of your program to users of your script, you can have
+the reports saved to files instead, with a line like this:
+
+\begin{verbatim}
+import cgitb; cgitb.enable(display=0, logdir="/tmp")
+\end{verbatim}
+
+It's very helpful to use this feature during script development.
+The reports produced by \refmodule{cgitb} provide information that
+can save you a lot of time in tracking down bugs. You can always
+remove the \code{cgitb} line later when you have tested your script
+and are confident that it works correctly.
+
+To get at submitted form data,
+it's best to use the \class{FieldStorage} class. The other classes
+defined in this module are provided mostly for backward compatibility.
+Instantiate it exactly once, without arguments. This reads the form
+contents from standard input or the environment (depending on the
+value of various environment variables set according to the CGI
+standard). Since it may consume standard input, it should be
+instantiated only once.
+
+The \class{FieldStorage} instance can be indexed like a Python
+dictionary, and also supports the standard dictionary methods
+\method{has_key()} and \method{keys()}. The built-in \function{len()}
+is also supported. Form fields containing empty strings are ignored
+and do not appear in the dictionary; to keep such values, provide
+a true value for the optional \var{keep_blank_values} keyword
+parameter when creating the \class{FieldStorage} instance.
+
+For instance, the following code (which assumes that the
+\mailheader{Content-Type} header and blank line have already been
+printed) checks that the fields \code{name} and \code{addr} are both
+set to a non-empty string:
+
+\begin{verbatim}
+form = cgi.FieldStorage()
+if not (form.has_key("name") and form.has_key("addr")):
+ print "<H1>Error</H1>"
+ print "Please fill in the name and addr fields."
+ return
+print "<p>name:", form["name"].value
+print "<p>addr:", form["addr"].value
+...further form processing here...
+\end{verbatim}
+
+Here the fields, accessed through \samp{form[\var{key}]}, are
+themselves instances of \class{FieldStorage} (or
+\class{MiniFieldStorage}, depending on the form encoding).
+The \member{value} attribute of the instance yields the string value
+of the field. The \method{getvalue()} method returns this string value
+directly; it also accepts an optional second argument as a default to
+return if the requested key is not present.
+
+If the submitted form data contains more than one field with the same
+name, the object retrieved by \samp{form[\var{key}]} is not a
+\class{FieldStorage} or \class{MiniFieldStorage}
+instance but a list of such instances. Similarly, in this situation,
+\samp{form.getvalue(\var{key})} would return a list of strings.
+If you expect this possibility
+(when your HTML form contains multiple fields with the same name), use
+the \function{getlist()} function, which always returns a list of values (so that you
+do not need to special-case the single item case). For example, this
+code concatenates any number of username fields, separated by
+commas:
+
+\begin{verbatim}
+value = form.getlist("username")
+usernames = ",".join(value)
+\end{verbatim}
+
+If a field represents an uploaded file, accessing the value via the
+\member{value} attribute or the \function{getvalue()} method reads the
+entire file in memory as a string. This may not be what you want.
+You can test for an uploaded file by testing either the \member{filename}
+attribute or the \member{file} attribute. You can then read the data at
+leisure from the \member{file} attribute:
+
+\begin{verbatim}
+fileitem = form["userfile"]
+if fileitem.file:
+ # It's an uploaded file; count lines
+ linecount = 0
+ while 1:
+ line = fileitem.file.readline()
+ if not line: break
+ linecount = linecount + 1
+\end{verbatim}
+
+The file upload draft standard entertains the possibility of uploading
+multiple files from one field (using a recursive
+\mimetype{multipart/*} encoding). When this occurs, the item will be
+a dictionary-like \class{FieldStorage} item. This can be determined
+by testing its \member{type} attribute, which should be
+\mimetype{multipart/form-data} (or perhaps another MIME type matching
+\mimetype{multipart/*}). In this case, it can be iterated over
+recursively just like the top-level form object.
+
+When a form is submitted in the ``old'' format (as the query string or
+as a single data part of type
+\mimetype{application/x-www-form-urlencoded}), the items will actually
+be instances of the class \class{MiniFieldStorage}. In this case, the
+\member{list}, \member{file}, and \member{filename} attributes are
+always \code{None}.
+
+
+\subsection{Higher Level Interface}
+
+\versionadded{2.2} % XXX: Is this true ?
+
+The previous section explains how to read CGI form data using the
+\class{FieldStorage} class. This section describes a higher level
+interface which was added to this class to allow one to do it in a
+more readable and intuitive way. The interface doesn't make the
+techniques described in previous sections obsolete --- they are still
+useful to process file uploads efficiently, for example.
+
+The interface consists of two simple methods. Using the methods
+you can process form data in a generic way, without the need to worry
+whether only one or more values were posted under one name.
+
+In the previous section, you learned to write following code anytime
+you expected a user to post more than one value under one name:
+
+\begin{verbatim}
+item = form.getvalue("item")
+if isinstance(item, list):
+ # The user is requesting more than one item.
+else:
+ # The user is requesting only one item.
+\end{verbatim}
+
+This situation is common for example when a form contains a group of
+multiple checkboxes with the same name:
+
+\begin{verbatim}
+<input type="checkbox" name="item" value="1" />
+<input type="checkbox" name="item" value="2" />
+\end{verbatim}
+
+In most situations, however, there's only one form control with a
+particular name in a form and then you expect and need only one value
+associated with this name. So you write a script containing for
+example this code:
+
+\begin{verbatim}
+user = form.getvalue("user").upper()
+\end{verbatim}
+
+The problem with the code is that you should never expect that a
+client will provide valid input to your scripts. For example, if a
+curious user appends another \samp{user=foo} pair to the query string,
+then the script would crash, because in this situation the
+\code{getvalue("user")} method call returns a list instead of a
+string. Calling the \method{toupper()} method on a list is not valid
+(since lists do not have a method of this name) and results in an
+\exception{AttributeError} exception.
+
+Therefore, the appropriate way to read form data values was to always
+use the code which checks whether the obtained value is a single value
+or a list of values. That's annoying and leads to less readable
+scripts.
+
+A more convenient approach is to use the methods \method{getfirst()}
+and \method{getlist()} provided by this higher level interface.
+
+\begin{methoddesc}[FieldStorage]{getfirst}{name\optional{, default}}
+ This method always returns only one value associated with form field
+ \var{name}. The method returns only the first value in case that
+ more values were posted under such name. Please note that the order
+ in which the values are received may vary from browser to browser
+ and should not be counted on.\footnote{Note that some recent
+ versions of the HTML specification do state what order the
+ field values should be supplied in, but knowing whether a
+ request was received from a conforming browser, or even from a
+ browser at all, is tedious and error-prone.} If no such form
+ field or value exists then the method returns the value specified by
+ the optional parameter \var{default}. This parameter defaults to
+ \code{None} if not specified.
+\end{methoddesc}
+
+\begin{methoddesc}[FieldStorage]{getlist}{name}
+ This method always returns a list of values associated with form
+ field \var{name}. The method returns an empty list if no such form
+ field or value exists for \var{name}. It returns a list consisting
+ of one item if only one such value exists.
+\end{methoddesc}
+
+Using these methods you can write nice compact code:
+
+\begin{verbatim}
+import cgi
+form = cgi.FieldStorage()
+user = form.getfirst("user", "").upper() # This way it's safe.
+for item in form.getlist("item"):
+ do_something(item)
+\end{verbatim}
+
+
+\subsection{Old classes}
+
+These classes, present in earlier versions of the \module{cgi} module,
+are still supported for backward compatibility. New applications
+should use the \class{FieldStorage} class.
+
+\class{SvFormContentDict} stores single value form content as
+dictionary; it assumes each field name occurs in the form only once.
+
+\class{FormContentDict} stores multiple value form content as a
+dictionary (the form items are lists of values). Useful if your form
+contains multiple fields with the same name.
+
+Other classes (\class{FormContent}, \class{InterpFormContentDict}) are
+present for backwards compatibility with really old applications only.
+If you still use these and would be inconvenienced when they
+disappeared from a next version of this module, drop me a note.
+
+
+\subsection{Functions}
+\nodename{Functions in cgi module}
+
+These are useful if you want more control, or if you want to employ
+some of the algorithms implemented in this module in other
+circumstances.
+
+\begin{funcdesc}{parse}{fp\optional{, keep_blank_values\optional{,
+ strict_parsing}}}
+ Parse a query in the environment or from a file (the file defaults
+ to \code{sys.stdin}). The \var{keep_blank_values} and
+ \var{strict_parsing} parameters are passed to \function{parse_qs()}
+ unchanged.
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values\optional{,
+ strict_parsing}}}
+Parse a query string given as a string argument (data of type
+\mimetype{application/x-www-form-urlencoded}). Data are
+returned as a dictionary. The dictionary keys are the unique query
+variable names and the values are lists of values for each name.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.
+A true value indicates that blanks should be retained as
+blank strings. The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors. If false (the default), errors
+are silently ignored. If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such dictionaries into query strings.
+
+\function{\refmodule{urlparse}.parse_qs()} is internally called to return the
+parsed query string.
+
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values\optional{,
+ strict_parsing}}}
+Parse a query string given as a string argument (data of type
+\mimetype{application/x-www-form-urlencoded}). Data are
+returned as a list of name, value pairs.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.
+A true value indicates that blanks should be retained as
+blank strings. The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors. If false (the default), errors
+are silently ignored. If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such lists of pairs into query strings.
+
+\function{\refmodule{urlparse}.parse_qsl()} is internally called to return the
+parsed query string list.
+
+\end{funcdesc}
+
+\begin{funcdesc}{parse_multipart}{fp, pdict}
+Parse input of type \mimetype{multipart/form-data} (for
+file uploads). Arguments are \var{fp} for the input file and
+\var{pdict} for a dictionary containing other parameters in
+the \mailheader{Content-Type} header.
+
+Returns a dictionary just like \function{parse_qs()} keys are the
+field names, each value is a list of values for that field. This is
+easy to use but not much good if you are expecting megabytes to be
+uploaded --- in that case, use the \class{FieldStorage} class instead
+which is much more flexible.
+
+Note that this does not parse nested multipart parts --- use
+\class{FieldStorage} for that.
+\end{funcdesc}
+
+\begin{funcdesc}{parse_header}{string}
+Parse a MIME header (such as \mailheader{Content-Type}) into a main
+value and a dictionary of parameters.
+\end{funcdesc}
+
+\begin{funcdesc}{test}{}
+Robust test CGI script, usable as main program.
+Writes minimal HTTP headers and formats all information provided to
+the script in HTML form.
+\end{funcdesc}
+
+\begin{funcdesc}{print_environ}{}
+Format the shell environment in HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{print_form}{form}
+Format a form in HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{print_directory}{}
+Format the current directory in HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{print_environ_usage}{}
+Print a list of useful (used by CGI) environment variables in
+HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{escape}{s\optional{, quote}}
+Convert the characters
+\character{\&}, \character{<} and \character{>} in string \var{s} to
+HTML-safe sequences. Use this if you need to display text that might
+contain such characters in HTML. If the optional flag \var{quote} is
+true, the quotation mark character (\character{"}) is also translated;
+this helps for inclusion in an HTML attribute value, as in \code{<A
+HREF="...">}. If the value to be quoted might include single- or
+double-quote characters, or both, consider using the
+\function{quoteattr()} function in the \refmodule{xml.sax.saxutils}
+module instead.
+\end{funcdesc}
+
+
+\subsection{Caring about security \label{cgi-security}}
+
+\indexii{CGI}{security}
+
+There's one important rule: if you invoke an external program (via the
+\function{os.system()} or \function{os.popen()} functions. or others
+with similar functionality), make very sure you don't pass arbitrary
+strings received from the client to the shell. This is a well-known
+security hole whereby clever hackers anywhere on the Web can exploit a
+gullible CGI script to invoke arbitrary shell commands. Even parts of
+the URL or field names cannot be trusted, since the request doesn't
+have to come from your form!
+
+To be on the safe side, if you must pass a string gotten from a form
+to a shell command, you should make sure the string contains only
+alphanumeric characters, dashes, underscores, and periods.
+
+
+\subsection{Installing your CGI script on a \UNIX\ system}
+
+Read the documentation for your HTTP server and check with your local
+system administrator to find the directory where CGI scripts should be
+installed; usually this is in a directory \file{cgi-bin} in the server tree.
+
+Make sure that your script is readable and executable by ``others''; the
+\UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755
+\var{filename}}). Make sure that the first line of the script contains
+\code{\#!} starting in column 1 followed by the pathname of the Python
+interpreter, for instance:
+
+\begin{verbatim}
+#!/usr/local/bin/python
+\end{verbatim}
+
+Make sure the Python interpreter exists and is executable by ``others''.
+
+Make sure that any files your script needs to read or write are
+readable or writable, respectively, by ``others'' --- their mode
+should be \code{0644} for readable and \code{0666} for writable. This
+is because, for security reasons, the HTTP server executes your script
+as user ``nobody'', without any special privileges. It can only read
+(write, execute) files that everybody can read (write, execute). The
+current directory at execution time is also different (it is usually
+the server's cgi-bin directory) and the set of environment variables
+is also different from what you get when you log in. In particular, don't
+count on the shell's search path for executables (\envvar{PATH}) or
+the Python module search path (\envvar{PYTHONPATH}) to be set to
+anything interesting.
+
+If you need to load modules from a directory which is not on Python's
+default module search path, you can change the path in your script,
+before importing other modules. For example:
+
+\begin{verbatim}
+import sys
+sys.path.insert(0, "/usr/home/joe/lib/python")
+sys.path.insert(0, "/usr/local/lib/python")
+\end{verbatim}
+
+(This way, the directory inserted last will be searched first!)
+
+Instructions for non-\UNIX{} systems will vary; check your HTTP server's
+documentation (it will usually have a section on CGI scripts).
+
+
+\subsection{Testing your CGI script}
+
+Unfortunately, a CGI script will generally not run when you try it
+from the command line, and a script that works perfectly from the
+command line may fail mysteriously when run from the server. There's
+one reason why you should still test your script from the command
+line: if it contains a syntax error, the Python interpreter won't
+execute it at all, and the HTTP server will most likely send a cryptic
+error to the client.
+
+Assuming your script has no syntax errors, yet it does not work, you
+have no choice but to read the next section.
+
+
+\subsection{Debugging CGI scripts} \indexii{CGI}{debugging}
+
+First of all, check for trivial installation errors --- reading the
+section above on installing your CGI script carefully can save you a
+lot of time. If you wonder whether you have understood the
+installation procedure correctly, try installing a copy of this module
+file (\file{cgi.py}) as a CGI script. When invoked as a script, the file
+will dump its environment and the contents of the form in HTML form.
+Give it the right mode etc, and send it a request. If it's installed
+in the standard \file{cgi-bin} directory, it should be possible to send it a
+request by entering a URL into your browser of the form:
+
+\begin{verbatim}
+http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
+\end{verbatim}
+
+If this gives an error of type 404, the server cannot find the script
+-- perhaps you need to install it in a different directory. If it
+gives another error, there's an installation problem that
+you should fix before trying to go any further. If you get a nicely
+formatted listing of the environment and form content (in this
+example, the fields should be listed as ``addr'' with value ``At Home''
+and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
+installed correctly. If you follow the same procedure for your own
+script, you should now be able to debug it.
+
+The next step could be to call the \module{cgi} module's
+\function{test()} function from your script: replace its main code
+with the single statement
+
+\begin{verbatim}
+cgi.test()
+\end{verbatim}
+
+This should produce the same results as those gotten from installing
+the \file{cgi.py} file itself.
+
+When an ordinary Python script raises an unhandled exception (for
+whatever reason: of a typo in a module name, a file that can't be
+opened, etc.), the Python interpreter prints a nice traceback and
+exits. While the Python interpreter will still do this when your CGI
+script raises an exception, most likely the traceback will end up in
+one of the HTTP server's log files, or be discarded altogether.
+
+Fortunately, once you have managed to get your script to execute
+\emph{some} code, you can easily send tracebacks to the Web browser
+using the \refmodule{cgitb} module. If you haven't done so already,
+just add the line:
+
+\begin{verbatim}
+import cgitb; cgitb.enable()
+\end{verbatim}
+
+to the top of your script. Then try running it again; when a
+problem occurs, you should see a detailed report that will
+likely make apparent the cause of the crash.
+
+If you suspect that there may be a problem in importing the
+\refmodule{cgitb} module, you can use an even more robust approach
+(which only uses built-in modules):
+
+\begin{verbatim}
+import sys
+sys.stderr = sys.stdout
+print "Content-Type: text/plain"
+print
+...your code here...
+\end{verbatim}
+
+This relies on the Python interpreter to print the traceback. The
+content type of the output is set to plain text, which disables all
+HTML processing. If your script works, the raw HTML will be displayed
+by your client. If it raises an exception, most likely after the
+first two lines have been printed, a traceback will be displayed.
+Because no HTML interpretation is going on, the traceback will be
+readable.
+
+
+\subsection{Common problems and solutions}
+
+\begin{itemize}
+\item Most HTTP servers buffer the output from CGI scripts until the
+script is completed. This means that it is not possible to display a
+progress report on the client's display while the script is running.
+
+\item Check the installation instructions above.
+
+\item Check the HTTP server's log files. (\samp{tail -f logfile} in a
+separate window may be useful!)
+
+\item Always check a script for syntax errors first, by doing something
+like \samp{python script.py}.
+
+\item If your script does not have any syntax errors, try adding
+\samp{import cgitb; cgitb.enable()} to the top of the script.
+
+\item When invoking external programs, make sure they can be found.
+Usually, this means using absolute path names --- \envvar{PATH} is
+usually not set to a very useful value in a CGI script.
+
+\item When reading or writing external files, make sure they can be read
+or written by the userid under which your CGI script will be running:
+this is typically the userid under which the web server is running, or some
+explicitly specified userid for a web server's \samp{suexec} feature.
+
+\item Don't try to give a CGI script a set-uid mode. This doesn't work on
+most systems, and is a security liability as well.
+\end{itemize}
+
Added: sandbox/trunk/urilib/liburllib2.tex
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/liburllib2.tex Sun Aug 5 23:42:33 2007
@@ -0,0 +1,873 @@
+\section{\module{urllib2} ---
+ extensible library for opening URLs}
+
+\declaremodule{standard}{urllib2}
+\moduleauthor{Jeremy Hylton}{jhylton at users.sourceforge.net}
+\sectionauthor{Moshe Zadka}{moshez at users.sourceforge.net}
+
+\modulesynopsis{An extensible library for opening URLs using a variety of
+ protocols}
+
+The \module{urllib2} module defines functions and classes which help
+in opening URLs (mostly HTTP) in a complex world --- basic and digest
+authentication, redirections, cookies and more.
+
+The \module{urllib2} module defines the following functions:
+
+\begin{funcdesc}{urlopen}{url\optional{, data}\optional{, timeout}}
+Open the URL \var{url}, which can be either a string or a \class{Request}
+object.
+
+\var{data} may be a string specifying additional data to send to the
+server, or \code{None} if no such data is needed.
+Currently HTTP requests are the only ones that use \var{data};
+the HTTP request will be a POST instead of a GET when the \var{data}
+parameter is provided. \var{data} should be a buffer in the standard
+\mimetype{application/x-www-form-urlencoded} format. The
+\function{urllib.urlencode()} function takes a mapping or sequence of
+2-tuples and returns a string in this format.
+
+The optional \var{timeout} parameter specifies a timeout in seconds for the
+connection attempt (if not specified, or passed as None, the global default
+timeout setting will be used). This actually only work for HTTP, HTTPS, FTP
+and FTPS connections.
+
+This function returns a file-like object with two additional methods:
+
+\begin{itemize}
+ \item \method{geturl()} --- return the URL of the resource retrieved
+ \item \method{info()} --- return the meta-information of the page, as
+ a dictionary-like object
+\end{itemize}
+
+Raises \exception{URLError} on errors.
+
+Note that \code{None} may be returned if no handler handles the
+request (though the default installed global \class{OpenerDirector}
+uses \class{UnknownHandler} to ensure this never happens).
+
+\versionchanged[\var{timeout} was added]{2.6}
+\end{funcdesc}
+
+\begin{funcdesc}{install_opener}{opener}
+Install an \class{OpenerDirector} instance as the default global
+opener. Installing an opener is only necessary if you want urlopen to
+use that opener; otherwise, simply call \method{OpenerDirector.open()}
+instead of \function{urlopen()}. The code does not check for a real
+\class{OpenerDirector}, and any class with the appropriate interface
+will work.
+\end{funcdesc}
+
+\begin{funcdesc}{build_opener}{\optional{handler, \moreargs}}
+Return an \class{OpenerDirector} instance, which chains the
+handlers in the order given. \var{handler}s can be either instances
+of \class{BaseHandler}, or subclasses of \class{BaseHandler} (in
+which case it must be possible to call the constructor without
+any parameters). Instances of the following classes will be in
+front of the \var{handler}s, unless the \var{handler}s contain
+them, instances of them or subclasses of them:
+\class{ProxyHandler}, \class{UnknownHandler}, \class{HTTPHandler},
+\class{HTTPDefaultErrorHandler}, \class{HTTPRedirectHandler},
+\class{FTPHandler}, \class{FileHandler}, \class{HTTPErrorProcessor}.
+
+If the Python installation has SSL support (\function{socket.ssl()}
+exists), \class{HTTPSHandler} will also be added.
+
+Beginning in Python 2.3, a \class{BaseHandler} subclass may also
+change its \member{handler_order} member variable to modify its
+position in the handlers list.
+\end{funcdesc}
+
+
+The following exceptions are raised as appropriate:
+
+\begin{excdesc}{URLError}
+The handlers raise this exception (or derived exceptions) when they
+run into a problem. It is a subclass of \exception{IOError}.
+\end{excdesc}
+
+\begin{excdesc}{HTTPError}
+A subclass of \exception{URLError}, it can also function as a
+non-exceptional file-like return value (the same thing that
+\function{urlopen()} returns). This is useful when handling exotic
+HTTP errors, such as requests for authentication.
+\end{excdesc}
+
+
+The following classes are provided:
+
+\begin{classdesc}{Request}{url\optional{, data}\optional{, headers}
+ \optional{, origin_req_host}\optional{, unverifiable}}
+This class is an abstraction of a URL request.
+
+\var{url} should be a string containing a valid URL.
+
+\var{data} may be a string specifying additional data to send to the
+server, or \code{None} if no such data is needed.
+Currently HTTP requests are the only ones that use \var{data};
+the HTTP request will be a POST instead of a GET when the \var{data}
+parameter is provided. \var{data} should be a buffer in the standard
+\mimetype{application/x-www-form-urlencoded} format. The
+\function{urllib.urlencode()} function takes a mapping or sequence of
+2-tuples and returns a string in this format.
+
+\var{headers} should be a dictionary, and will be treated as if
+\method{add_header()} was called with each key and value as arguments.
+
+The final two arguments are only of interest for correct handling of
+third-party HTTP cookies:
+
+\var{origin_req_host} should be the request-host of the origin
+transaction, as defined by \rfc{2965}. It defaults to
+\code{cookielib.request_host(self)}. This is the host name or IP
+address of the original request that was initiated by the user. For
+example, if the request is for an image in an HTML document, this
+should be the request-host of the request for the page containing the
+image.
+
+\var{unverifiable} should indicate whether the request is
+unverifiable, as defined by RFC 2965. It defaults to False. An
+unverifiable request is one whose URL the user did not have the option
+to approve. For example, if the request is for an image in an HTML
+document, and the user had no option to approve the automatic fetching
+of the image, this should be true.
+\end{classdesc}
+
+\begin{classdesc}{OpenerDirector}{}
+The \class{OpenerDirector} class opens URLs via \class{BaseHandler}s
+chained together. It manages the chaining of handlers, and recovery
+from errors.
+\end{classdesc}
+
+\begin{classdesc}{BaseHandler}{}
+This is the base class for all registered handlers --- and handles only
+the simple mechanics of registration.
+\end{classdesc}
+
+\begin{classdesc}{HTTPDefaultErrorHandler}{}
+A class which defines a default handler for HTTP error responses; all
+responses are turned into \exception{HTTPError} exceptions.
+\end{classdesc}
+
+\begin{classdesc}{HTTPRedirectHandler}{}
+A class to handle redirections.
+\end{classdesc}
+
+\begin{classdesc}{HTTPCookieProcessor}{\optional{cookiejar}}
+A class to handle HTTP Cookies.
+\end{classdesc}
+
+\begin{classdesc}{ProxyHandler}{\optional{proxies}}
+Cause requests to go through a proxy.
+If \var{proxies} is given, it must be a dictionary mapping
+protocol names to URLs of proxies.
+The default is to read the list of proxies from the environment
+variables \envvar{<protocol>_proxy}.
+\end{classdesc}
+
+\begin{classdesc}{HTTPPasswordMgr}{}
+Keep a database of
+\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})}
+mappings.
+\end{classdesc}
+
+\begin{classdesc}{HTTPPasswordMgrWithDefaultRealm}{}
+Keep a database of
+\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} mappings.
+A realm of \code{None} is considered a catch-all realm, which is searched
+if no other realm fits.
+\end{classdesc}
+
+\begin{classdesc}{AbstractBasicAuthHandler}{\optional{password_mgr}}
+This is a mixin class that helps with HTTP authentication, both
+to the remote host and to a proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{HTTPBasicAuthHandler}{\optional{password_mgr}}
+Handle authentication with the remote host.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{ProxyBasicAuthHandler}{\optional{password_mgr}}
+Handle authentication with the proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{AbstractDigestAuthHandler}{\optional{password_mgr}}
+This is a mixin class that helps with HTTP authentication, both
+to the remote host and to a proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{HTTPDigestAuthHandler}{\optional{password_mgr}}
+Handle authentication with the remote host.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{ProxyDigestAuthHandler}{\optional{password_mgr}}
+Handle authentication with the proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{HTTPHandler}{}
+A class to handle opening of HTTP URLs.
+\end{classdesc}
+
+\begin{classdesc}{HTTPSHandler}{}
+A class to handle opening of HTTPS URLs.
+\end{classdesc}
+
+\begin{classdesc}{FileHandler}{}
+Open local files.
+\end{classdesc}
+
+\begin{classdesc}{FTPHandler}{}
+Open FTP URLs.
+\end{classdesc}
+
+\begin{classdesc}{CacheFTPHandler}{}
+Open FTP URLs, keeping a cache of open FTP connections to minimize
+delays.
+\end{classdesc}
+
+\begin{classdesc}{UnknownHandler}{}
+A catch-all class to handle unknown URLs.
+\end{classdesc}
+
+
+\subsection{Request Objects \label{request-objects}}
+
+The following methods describe all of \class{Request}'s public interface,
+and so all must be overridden in subclasses.
+
+\begin{methoddesc}[Request]{add_data}{data}
+Set the \class{Request} data to \var{data}. This is ignored by all
+handlers except HTTP handlers --- and there it should be a byte
+string, and will change the request to be \code{POST} rather than
+\code{GET}.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_method}{}
+Return a string indicating the HTTP request method. This is only
+meaningful for HTTP requests, and currently always returns
+\code{'GET'} or \code{'POST'}.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{has_data}{}
+Return whether the instance has a non-\code{None} data.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_data}{}
+Return the instance's data.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{add_header}{key, val}
+Add another header to the request. Headers are currently ignored by
+all handlers except HTTP handlers, where they are added to the list
+of headers sent to the server. Note that there cannot be more than
+one header with the same name, and later calls will overwrite
+previous calls in case the \var{key} collides. Currently, this is
+no loss of HTTP functionality, since all headers which have meaning
+when used more than once have a (header-specific) way of gaining the
+same functionality using only one header.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{add_unredirected_header}{key, header}
+Add a header that will not be added to a redirected request.
+\versionadded{2.4}
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{has_header}{header}
+Return whether the instance has the named header (checks both regular
+and unredirected).
+\versionadded{2.4}
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_full_url}{}
+Return the URL given in the constructor.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_type}{}
+Return the type of the URL --- also known as the scheme.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_host}{}
+Return the host to which a connection will be made.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_selector}{}
+Return the selector --- the part of the URL that is sent to
+the server.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{set_proxy}{host, type}
+Prepare the request by connecting to a proxy server. The \var{host}
+and \var{type} will replace those of the instance, and the instance's
+selector will be the original URL given in the constructor.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_origin_req_host}{}
+Return the request-host of the origin transaction, as defined by
+\rfc{2965}. See the documentation for the \class{Request}
+constructor.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{is_unverifiable}{}
+Return whether the request is unverifiable, as defined by RFC 2965.
+See the documentation for the \class{Request} constructor.
+\end{methoddesc}
+
+
+\subsection{OpenerDirector Objects \label{opener-director-objects}}
+
+\class{OpenerDirector} instances have the following methods:
+
+\begin{methoddesc}[OpenerDirector]{add_handler}{handler}
+\var{handler} should be an instance of \class{BaseHandler}. The
+following methods are searched, and added to the possible chains (note
+that HTTP errors are a special case).
+
+\begin{itemize}
+ \item \method{\var{protocol}_open()} ---
+ signal that the handler knows how to open \var{protocol} URLs.
+ \item \method{http_error_\var{type}()} ---
+ signal that the handler knows how to handle HTTP errors with HTTP
+ error code \var{type}.
+ \item \method{\var{protocol}_error()} ---
+ signal that the handler knows how to handle errors from
+ (non-\code{http}) \var{protocol}.
+ \item \method{\var{protocol}_request()} ---
+ signal that the handler knows how to pre-process \var{protocol}
+ requests.
+ \item \method{\var{protocol}_response()} ---
+ signal that the handler knows how to post-process \var{protocol}
+ responses.
+\end{itemize}
+\end{methoddesc}
+
+\begin{methoddesc}[OpenerDirector]{open}{url\optional{, data}{\optional{, timeout}}}
+Open the given \var{url} (which can be a request object or a string),
+optionally passing the given \var{data}.
+Arguments, return values and exceptions raised are the same as those
+of \function{urlopen()} (which simply calls the \method{open()} method
+on the currently installed global \class{OpenerDirector}). The optional
+\var{timeout} parameter specifies a timeout in seconds for the connection
+attempt (if not specified, or passed as None, the global default timeout
+setting will be used; this actually only work for HTTP, HTTPS, FTP
+and FTPS connections).
+
+\versionchanged[\var{timeout} was added]{2.6}
+\end{methoddesc}
+
+\begin{methoddesc}[OpenerDirector]{error}{proto\optional{,
+ arg\optional{, \moreargs}}}
+Handle an error of the given protocol. This will call the registered
+error handlers for the given protocol with the given arguments (which
+are protocol specific). The HTTP protocol is a special case which
+uses the HTTP response code to determine the specific error handler;
+refer to the \method{http_error_*()} methods of the handler classes.
+
+Return values and exceptions raised are the same as those
+of \function{urlopen()}.
+\end{methoddesc}
+
+OpenerDirector objects open URLs in three stages:
+
+The order in which these methods are called within each stage is
+determined by sorting the handler instances.
+
+\begin{enumerate}
+ \item Every handler with a method named like
+ \method{\var{protocol}_request()} has that method called to
+ pre-process the request.
+
+ \item Handlers with a method named like
+ \method{\var{protocol}_open()} are called to handle the request.
+ This stage ends when a handler either returns a
+ non-\constant{None} value (ie. a response), or raises an exception
+ (usually \exception{URLError}). Exceptions are allowed to propagate.
+
+ In fact, the above algorithm is first tried for methods named
+ \method{default_open}. If all such methods return
+ \constant{None}, the algorithm is repeated for methods named like
+ \method{\var{protocol}_open()}. If all such methods return
+ \constant{None}, the algorithm is repeated for methods named
+ \method{unknown_open()}.
+
+ Note that the implementation of these methods may involve calls of
+ the parent \class{OpenerDirector} instance's \method{.open()} and
+ \method{.error()} methods.
+
+ \item Every handler with a method named like
+ \method{\var{protocol}_response()} has that method called to
+ post-process the response.
+
+\end{enumerate}
+
+\subsection{BaseHandler Objects \label{base-handler-objects}}
+
+\class{BaseHandler} objects provide a couple of methods that are
+directly useful, and others that are meant to be used by derived
+classes. These are intended for direct use:
+
+\begin{methoddesc}[BaseHandler]{add_parent}{director}
+Add a director as parent.
+\end{methoddesc}
+
+\begin{methoddesc}[BaseHandler]{close}{}
+Remove any parents.
+\end{methoddesc}
+
+The following members and methods should only be used by classes
+derived from \class{BaseHandler}. \note{The convention has been
+adopted that subclasses defining \method{\var{protocol}_request()} or
+\method{\var{protocol}_response()} methods are named
+\class{*Processor}; all others are named \class{*Handler}.}
+
+
+\begin{memberdesc}[BaseHandler]{parent}
+A valid \class{OpenerDirector}, which can be used to open using a
+different protocol, or handle errors.
+\end{memberdesc}
+
+\begin{methoddesc}[BaseHandler]{default_open}{req}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to catch all URLs.
+
+This method, if implemented, will be called by the parent
+\class{OpenerDirector}. It should return a file-like object as
+described in the return value of the \method{open()} of
+\class{OpenerDirector}, or \code{None}. It should raise
+\exception{URLError}, unless a truly exceptional thing happens (for
+example, \exception{MemoryError} should not be mapped to
+\exception{URLError}).
+
+This method will be called before any protocol-specific open method.
+\end{methoddesc}
+
+\begin{methoddescni}[BaseHandler]{\var{protocol}_open}{req}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to handle URLs with the given
+protocol.
+
+This method, if defined, will be called by the parent
+\class{OpenerDirector}. Return values should be the same as for
+\method{default_open()}.
+\end{methoddescni}
+
+\begin{methoddesc}[BaseHandler]{unknown_open}{req}
+This method is \var{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to catch all URLs with no
+specific registered handler to open it.
+
+This method, if implemented, will be called by the \member{parent}
+\class{OpenerDirector}. Return values should be the same as for
+\method{default_open()}.
+\end{methoddesc}
+
+\begin{methoddesc}[BaseHandler]{http_error_default}{req, fp, code, msg, hdrs}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should override it if they intend to provide a catch-all
+for otherwise unhandled HTTP errors. It will be called automatically
+by the \class{OpenerDirector} getting the error, and should not
+normally be called in other circumstances.
+
+\var{req} will be a \class{Request} object, \var{fp} will be a
+file-like object with the HTTP error body, \var{code} will be the
+three-digit code of the error, \var{msg} will be the user-visible
+explanation of the code and \var{hdrs} will be a mapping object with
+the headers of the error.
+
+Return values and exceptions raised should be the same as those
+of \function{urlopen()}.
+\end{methoddesc}
+
+\begin{methoddesc}[BaseHandler]{http_error_\var{nnn}}{req, fp, code, msg, hdrs}
+\var{nnn} should be a three-digit HTTP error code. This method is
+also not defined in \class{BaseHandler}, but will be called, if it
+exists, on an instance of a subclass, when an HTTP error with code
+\var{nnn} occurs.
+
+Subclasses should override this method to handle specific HTTP
+errors.
+
+Arguments, return values and exceptions raised should be the same as
+for \method{http_error_default()}.
+\end{methoddesc}
+
+\begin{methoddescni}[BaseHandler]{\var{protocol}_request}{req}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to pre-process requests of
+the given protocol.
+
+This method, if defined, will be called by the parent
+\class{OpenerDirector}. \var{req} will be a \class{Request} object.
+The return value should be a \class{Request} object.
+\end{methoddescni}
+
+\begin{methoddescni}[BaseHandler]{\var{protocol}_response}{req, response}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to post-process responses of
+the given protocol.
+
+This method, if defined, will be called by the parent
+\class{OpenerDirector}. \var{req} will be a \class{Request} object.
+\var{response} will be an object implementing the same interface as
+the return value of \function{urlopen()}. The return value should
+implement the same interface as the return value of
+\function{urlopen()}.
+\end{methoddescni}
+
+\subsection{HTTPRedirectHandler Objects \label{http-redirect-handler}}
+
+\note{Some HTTP redirections require action from this module's client
+ code. If this is the case, \exception{HTTPError} is raised. See
+ \rfc{2616} for details of the precise meanings of the various
+ redirection codes.}
+
+\begin{methoddesc}[HTTPRedirectHandler]{redirect_request}{req,
+ fp, code, msg, hdrs}
+Return a \class{Request} or \code{None} in response to a redirect.
+This is called by the default implementations of the
+\method{http_error_30*()} methods when a redirection is received from
+the server. If a redirection should take place, return a new
+\class{Request} to allow \method{http_error_30*()} to perform the
+redirect. Otherwise, raise \exception{HTTPError} if no other handler
+should try to handle this URL, or return \code{None} if you can't but
+another handler might.
+
+\begin{notice}
+ The default implementation of this method does not strictly
+ follow \rfc{2616}, which says that 301 and 302 responses to \code{POST}
+ requests must not be automatically redirected without confirmation by
+ the user. In reality, browsers do allow automatic redirection of
+ these responses, changing the POST to a \code{GET}, and the default
+ implementation reproduces this behavior.
+\end{notice}
+\end{methoddesc}
+
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_301}{req,
+ fp, code, msg, hdrs}
+Redirect to the \code{Location:} URL. This method is called by
+the parent \class{OpenerDirector} when getting an HTTP
+`moved permanently' response. The 301 redirection is cached as per
+\rfc{2616}.
+\end{methoddesc}
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_302}{req,
+ fp, code, msg, hdrs}
+The same as \method{http_error_301()}, but called for the
+`found' response.
+\end{methoddesc}
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_303}{req,
+ fp, code, msg, hdrs}
+The same as \method{http_error_301()}, but called for the
+`see other' response.
+\end{methoddesc}
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_307}{req,
+ fp, code, msg, hdrs}
+The same as \method{http_error_301()}, but called for the
+`temporary redirect' response.
+\end{methoddesc}
+
+
+\subsection{HTTPCookieProcessor Objects \label{http-cookie-processor}}
+
+\versionadded{2.4}
+
+\class{HTTPCookieProcessor} instances have one attribute:
+
+\begin{memberdesc}[HTTPCookieProcessor]{cookiejar}
+The \class{cookielib.CookieJar} in which cookies are stored.
+\end{memberdesc}
+
+
+\subsection{ProxyHandler Objects \label{proxy-handler}}
+
+\begin{methoddescni}[ProxyHandler]{\var{protocol}_open}{request}
+The \class{ProxyHandler} will have a method
+\method{\var{protocol}_open()} for every \var{protocol} which has a
+proxy in the \var{proxies} dictionary given in the constructor. The
+method will modify requests to go through the proxy, by calling
+\code{request.set_proxy()}, and call the next handler in the chain to
+actually execute the protocol.
+\end{methoddescni}
+
+
+\subsection{HTTPPasswordMgr Objects \label{http-password-mgr}}
+
+These methods are available on \class{HTTPPasswordMgr} and
+\class{HTTPPasswordMgrWithDefaultRealm} objects.
+
+\begin{methoddesc}[HTTPPasswordMgr]{add_password}{realm, uri, user, passwd}
+\var{uri} can be either a single URI, or a sequence of URIs. \var{realm},
+\var{user} and \var{passwd} must be strings. This causes
+\code{(\var{user}, \var{passwd})} to be used as authentication tokens
+when authentication for \var{realm} and a super-URI of any of the
+given URIs is given.
+\end{methoddesc}
+
+\begin{methoddesc}[HTTPPasswordMgr]{find_user_password}{realm, authuri}
+Get user/password for given realm and URI, if any. This method will
+return \code{(None, None)} if there is no matching user/password.
+
+For \class{HTTPPasswordMgrWithDefaultRealm} objects, the realm
+\code{None} will be searched if the given \var{realm} has no matching
+user/password.
+\end{methoddesc}
+
+
+\subsection{AbstractBasicAuthHandler Objects
+ \label{abstract-basic-auth-handler}}
+
+\begin{methoddesc}[AbstractBasicAuthHandler]{http_error_auth_reqed}
+ {authreq, host, req, headers}
+Handle an authentication request by getting a user/password pair, and
+re-trying the request. \var{authreq} should be the name of the header
+where the information about the realm is included in the request,
+\var{host} specifies the URL and path to authenticate for, \var{req}
+should be the (failed) \class{Request} object, and \var{headers}
+should be the error headers.
+
+\var{host} is either an authority (e.g. \code{"python.org"}) or a URL
+containing an authority component (e.g. \code{"http://python.org/"}).
+In either case, the authority must not contain a userinfo component
+(so, \code{"python.org"} and \code{"python.org:80"} are fine,
+\code{"joe:password at python.org"} is not).
+\end{methoddesc}
+
+
+\subsection{HTTPBasicAuthHandler Objects
+ \label{http-basic-auth-handler}}
+
+\begin{methoddesc}[HTTPBasicAuthHandler]{http_error_401}{req, fp, code,
+ msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{ProxyBasicAuthHandler Objects
+ \label{proxy-basic-auth-handler}}
+
+\begin{methoddesc}[ProxyBasicAuthHandler]{http_error_407}{req, fp, code,
+ msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{AbstractDigestAuthHandler Objects
+ \label{abstract-digest-auth-handler}}
+
+\begin{methoddesc}[AbstractDigestAuthHandler]{http_error_auth_reqed}
+ {authreq, host, req, headers}
+\var{authreq} should be the name of the header where the information about
+the realm is included in the request, \var{host} should be the host to
+authenticate to, \var{req} should be the (failed) \class{Request}
+object, and \var{headers} should be the error headers.
+\end{methoddesc}
+
+
+\subsection{HTTPDigestAuthHandler Objects
+ \label{http-digest-auth-handler}}
+
+\begin{methoddesc}[HTTPDigestAuthHandler]{http_error_401}{req, fp, code,
+ msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{ProxyDigestAuthHandler Objects
+ \label{proxy-digest-auth-handler}}
+
+\begin{methoddesc}[ProxyDigestAuthHandler]{http_error_407}{req, fp, code,
+ msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{HTTPHandler Objects \label{http-handler-objects}}
+
+\begin{methoddesc}[HTTPHandler]{http_open}{req}
+Send an HTTP request, which can be either GET or POST, depending on
+\code{\var{req}.has_data()}.
+\end{methoddesc}
+
+
+\subsection{HTTPSHandler Objects \label{https-handler-objects}}
+
+\begin{methoddesc}[HTTPSHandler]{https_open}{req}
+Send an HTTPS request, which can be either GET or POST, depending on
+\code{\var{req}.has_data()}.
+\end{methoddesc}
+
+
+\subsection{FileHandler Objects \label{file-handler-objects}}
+
+\begin{methoddesc}[FileHandler]{file_open}{req}
+Open the file locally, if there is no host name, or
+the host name is \code{'localhost'}. Change the
+protocol to \code{ftp} otherwise, and retry opening
+it using \member{parent}.
+\end{methoddesc}
+
+
+\subsection{FTPHandler Objects \label{ftp-handler-objects}}
+
+\begin{methoddesc}[FTPHandler]{ftp_open}{req}
+Open the FTP file indicated by \var{req}.
+The login is always done with empty username and password.
+\end{methoddesc}
+
+
+\subsection{CacheFTPHandler Objects \label{cacheftp-handler-objects}}
+
+\class{CacheFTPHandler} objects are \class{FTPHandler} objects with
+the following additional methods:
+
+\begin{methoddesc}[CacheFTPHandler]{setTimeout}{t}
+Set timeout of connections to \var{t} seconds.
+\end{methoddesc}
+
+\begin{methoddesc}[CacheFTPHandler]{setMaxConns}{m}
+Set maximum number of cached connections to \var{m}.
+\end{methoddesc}
+
+
+\subsection{UnknownHandler Objects \label{unknown-handler-objects}}
+
+\begin{methoddesc}[UnknownHandler]{unknown_open}{}
+Raise a \exception{URLError} exception.
+\end{methoddesc}
+
+
+\subsection{HTTPErrorProcessor Objects \label{http-error-processor-objects}}
+
+\versionadded{2.4}
+
+\begin{methoddesc}[HTTPErrorProcessor]{unknown_open}{}
+Process HTTP error responses.
+
+For 200 error codes, the response object is returned immediately.
+
+For non-200 error codes, this simply passes the job on to the
+\method{\var{protocol}_error_\var{code}()} handler methods, via
+\method{OpenerDirector.error()}. Eventually,
+\class{urllib2.HTTPDefaultErrorHandler} will raise an
+\exception{HTTPError} if no other handler handles the error.
+\end{methoddesc}
+
+
+\subsection{Examples \label{urllib2-examples}}
+
+This example gets the python.org main page and displays the first 100
+bytes of it:
+
+\begin{verbatim}
+>>> import urllib2
+>>> f = urllib2.urlopen('http://www.python.org/')
+>>> print f.read(100)
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<?xml-stylesheet href="./css/ht2html
+\end{verbatim}
+
+Here we are sending a data-stream to the stdin of a CGI and reading
+the data it returns to us. Note that this example will only work when the
+Python installation supports SSL.
+
+\begin{verbatim}
+>>> import urllib2
+>>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
+... data='This data is passed to stdin of the CGI')
+>>> f = urllib2.urlopen(req)
+>>> print f.read()
+Got Data: "This data is passed to stdin of the CGI"
+\end{verbatim}
+
+The code for the sample CGI used in the above example is:
+
+\begin{verbatim}
+#!/usr/bin/env python
+import sys
+data = sys.stdin.read()
+print 'Content-type: text-plain\n\nGot Data: "%s"' % data
+\end{verbatim}
+
+
+Use of Basic HTTP Authentication:
+
+\begin{verbatim}
+import urllib2
+# Create an OpenerDirector with support for Basic HTTP Authentication...
+auth_handler = urllib2.HTTPBasicAuthHandler()
+auth_handler.add_password(realm='PDQ Application',
+ uri='https://mahler:8092/site-updates.py',
+ user='klem',
+ passwd='kadidd!ehopper')
+opener = urllib2.build_opener(auth_handler)
+# ...and install it globally so it can be used with urlopen.
+urllib2.install_opener(opener)
+urllib2.urlopen('http://www.example.com/login.html')
+\end{verbatim}
+
+\function{build_opener()} provides many handlers by default, including a
+\class{ProxyHandler}. By default, \class{ProxyHandler} uses the
+environment variables named \code{<scheme>_proxy}, where \code{<scheme>}
+is the URL scheme involved. For example, the \envvar{http_proxy}
+environment variable is read to obtain the HTTP proxy's URL.
+
+This example replaces the default \class{ProxyHandler} with one that uses
+programatically-supplied proxy URLs, and adds proxy authorization support
+with \class{ProxyBasicAuthHandler}.
+
+\begin{verbatim}
+proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
+proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
+proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
+
+opener = build_opener(proxy_handler, proxy_auth_handler)
+# This time, rather than install the OpenerDirector, we use it directly:
+opener.open('http://www.example.com/login.html')
+\end{verbatim}
+
+
+Adding HTTP headers:
+
+Use the \var{headers} argument to the \class{Request} constructor, or:
+
+\begin{verbatim}
+import urllib2
+req = urllib2.Request('http://www.example.com/')
+req.add_header('Referer', 'http://www.python.org/')
+r = urllib2.urlopen(req)
+\end{verbatim}
+
+\class{OpenerDirector} automatically adds a \mailheader{User-Agent}
+header to every \class{Request}. To change this:
+
+\begin{verbatim}
+import urllib2
+opener = urllib2.build_opener()
+opener.addheaders = [('User-agent', 'Mozilla/5.0')]
+opener.open('http://www.example.com/')
+\end{verbatim}
+
+Also, remember that a few standard headers
+(\mailheader{Content-Length}, \mailheader{Content-Type} and
+\mailheader{Host}) are added when the \class{Request} is passed to
+\function{urlopen()} (or \method{OpenerDirector.open()}).
Added: sandbox/trunk/urilib/liburlparse.tex
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/liburlparse.tex Sun Aug 5 23:42:33 2007
@@ -0,0 +1,310 @@
+\section{\module{urlparse} ---
+ Parse URLs into components}
+\declaremodule{standard}{urlparse}
+
+\modulesynopsis{Parse URLs into components.}
+
+\index{WWW}
+\index{World Wide Web}
+\index{URL}
+\indexii{URL}{parsing}
+\indexii{relative}{URL}
+
+
+This module defines a standard interface to break Uniform Resource
+Locator (URL) strings up in components (addressing scheme, network
+location, path etc.), to combine the components back into a URL
+string, and to convert a ``relative URL'' to an absolute URL given a
+``base URL.''
+
+The module has been designed to match the Internet RFC on Relative
+Uniform Resource Locators (and discovered a bug in an earlier
+draft!). It supports the following URL schemes:
+\code{file}, \code{ftp}, \code{gopher}, \code{hdl}, \code{http},
+\code{https}, \code{imap}, \code{mailto}, \code{mms}, \code{news},
+\code{nntp}, \code{prospero}, \code{rsync}, \code{rtsp}, \code{rtspu},
+\code{sftp}, \code{shttp}, \code{sip}, \code{sips}, \code{snews}, \code{svn},
+\code{svn+ssh}, \code{telnet}, \code{wais}.
+
+\versionadded[Support for the \code{sftp} and \code{sips} schemes]{2.5}
+
+The \module{urlparse} module defines the following functions:
+
+\begin{funcdesc}{urlparse}{urlstring\optional{,
+ default_scheme\optional{, allow_fragments}}}
+Parse a URL into six components, returning a 6-tuple. This
+corresponds to the general structure of a URL:
+\code{\var{scheme}://\var{netloc}/\var{path};\var{parameters}?\var{query}\#\var{fragment}}.
+Each tuple item is a string, possibly empty.
+The components are not broken up in smaller parts (for example, the network
+location is a single string), and \% escapes are not expanded.
+The delimiters as shown above are not part of the result,
+except for a leading slash in the \var{path} component, which is
+retained if present. For example:
+
+\begin{verbatim}
+>>> from urlparse import urlparse
+>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
+>>> o
+('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '')
+>>> o.scheme
+'http'
+>>> o.port
+80
+>>> o.geturl()
+'http://www.cwi.nl:80/%7Eguido/Python.html'
+\end{verbatim}
+
+If the \var{default_scheme} argument is specified, it gives the
+default addressing scheme, to be used only if the URL does not
+specify one. The default value for this argument is the empty string.
+
+If the \var{allow_fragments} argument is false, fragment identifiers
+are not allowed, even if the URL's addressing scheme normally does
+support them. The default value for this argument is \constant{True}.
+
+The return value is actually an instance of a subclass of
+\pytype{tuple}. This class has the following additional read-only
+convenience attributes:
+
+\begin{tableiv}{l|c|l|c}{member}{Attribute}{Index}{Value}{Value if not present}
+ \lineiv{scheme} {0} {URL scheme specifier} {empty string}
+ \lineiv{netloc} {1} {Network location part} {empty string}
+ \lineiv{path} {2} {Hierarchical path} {empty string}
+ \lineiv{params} {3} {Parameters for last path element} {empty string}
+ \lineiv{query} {4} {Query component} {empty string}
+ \lineiv{fragment}{5} {Fragment identifier} {empty string}
+ \lineiv{username}{ } {User name} {\constant{None}}
+ \lineiv{password}{ } {Password} {\constant{None}}
+ \lineiv{hostname}{ } {Host name (lower case)} {\constant{None}}
+ \lineiv{port} { } {Port number as integer, if present} {\constant{None}}
+\end{tableiv}
+
+See section~\ref{urlparse-result-object}, ``Results of
+\function{urlparse()} and \function{urlsplit()},'' for more
+information on the result object.
+
+\versionchanged[Added attributes to return value]{2.5}
+\end{funcdesc}
+
+\begin{funcdesc}{urlunparse}{parts}
+Construct a URL from a tuple as returned by \code{urlparse()}.
+The \var{parts} argument can be any six-item iterable.
+This may result in a slightly different, but equivalent URL, if the
+URL that was parsed originally had unnecessary delimiters (for example,
+a ? with an empty query; the RFC states that these are equivalent).
+\end{funcdesc}
+
+\begin{funcdesc}{urlsplit}{urlstring\optional{,
+ default_scheme\optional{, allow_fragments}}}
+This is similar to \function{urlparse()}, but does not split the
+params from the URL. This should generally be used instead of
+\function{urlparse()} if the more recent URL syntax allowing
+parameters to be applied to each segment of the \var{path} portion of
+the URL (see \rfc{2396}) is wanted. A separate function is needed to
+separate the path segments and parameters. This function returns a
+5-tuple: (addressing scheme, network location, path, query, fragment
+identifier).
+
+The return value is actually an instance of a subclass of
+\pytype{tuple}. This class has the following additional read-only
+convenience attributes:
+
+\begin{tableiv}{l|c|l|c}{member}{Attribute}{Index}{Value}{Value if not present}
+ \lineiv{scheme} {0} {URL scheme specifier} {empty string}
+ \lineiv{netloc} {1} {Network location part} {empty string}
+ \lineiv{path} {2} {Hierarchical path} {empty string}
+ \lineiv{query} {3} {Query component} {empty string}
+ \lineiv{fragment} {4} {Fragment identifier} {empty string}
+ \lineiv{username} { } {User name} {\constant{None}}
+ \lineiv{password} { } {Password} {\constant{None}}
+ \lineiv{hostname} { } {Host name (lower case)} {\constant{None}}
+ \lineiv{port} { } {Port number as integer, if present} {\constant{None}}
+ \lineiv{parsedquery} { } {parsed query string} {empty string}
+ \lineiv{parsedquerylist} { } {parsed query string as list} {empty string}
+\end{tableiv}
+
+See section~\ref{urlparse-result-object}, ``Results of
+\function{urlparse()} and \function{urlsplit()},'' for more
+information on the result object.
+
+\versionadded{2.2}
+\versionchanged[Added attributes to return value]{2.5}
+\end{funcdesc}
+
+\begin{funcdesc}{urlunsplit}{parts}
+Combine the elements of a tuple as returned by \function{urlsplit()}
+into a complete URL as a string.
+The \var{parts} argument can be any five-item iterable.
+This may result in a slightly different, but equivalent URL, if the
+URL that was parsed originally had unnecessary delimiters (for example,
+a ? with an empty query; the RFC states that these are equivalent).
+\versionadded{2.2}
+\end{funcdesc}
+
+\begin{funcdesc}{urljoin}{base, url\optional{, allow_fragments}}
+Construct a full (``absolute'') URL by combining a ``base URL''
+(\var{base}) with another URL (\var{url}). Informally, this
+uses components of the base URL, in particular the addressing scheme,
+the network location and (part of) the path, to provide missing
+components in the relative URL. For example:
+
+\begin{verbatim}
+>>> from urlparse import urljoin
+>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
+'http://www.cwi.nl/%7Eguido/FAQ.html'
+\end{verbatim}
+
+The \var{allow_fragments} argument has the same meaning and default as
+for \function{urlparse()}.
+
+\note{If \var{url} is an absolute URL (that is, starting with \code{//}
+ or \code{scheme://}), the \var{url}'s host name and/or scheme
+ will be present in the result. For example:}
+
+\begin{verbatim}
+>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
+... '//www.python.org/%7Eguido')
+'http://www.python.org/%7Eguido'
+\end{verbatim}
+
+If you do not want that behavior, preprocess
+the \var{url} with \function{urlsplit()} and \function{urlunsplit()},
+removing possible \emph{scheme} and \emph{netloc} parts.
+\end{funcdesc}
+
+\begin{funcdesc}{urldefrag}{url}
+If \var{url} contains a fragment identifier, returns a modified
+version of \var{url} with no fragment identifier, and the fragment
+identifier as a separate string. If there is no fragment identifier
+in \var{url}, returns \var{url} unmodified and an empty string.
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values\optional{,
+ strict_parsing}}}
+Parse a query string given as a string argument (data of type
+\mimetype{application/x-www-form-urlencoded}). Data are
+returned as a dictionary. The dictionary keys are the unique query
+variable names and the values are lists of values for each name.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.
+A true value indicates that blanks should be retained as
+blank strings. The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors. If false (the default), errors
+are silently ignored. If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such dictionaries into query strings.
+
+This is also available as \function{parsedquery()} function at the instance of
+urlparse.
+
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values\optional{,
+ strict_parsing}}}
+Parse a query string given as a string argument (data of type
+\mimetype{application/x-www-form-urlencoded}). Data are
+returned as a list of name, value pairs.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.
+A true value indicates that blanks should be retained as
+blank strings. The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors. If false (the default), errors
+are silently ignored. If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such lists of pairs into query strings.
+\end{funcdesc}
+
+This is also available as \function{parsedquerylist()} function at the
+instance of urlparse.
+
+
+
+\begin{seealso}
+ \seerfc{1738}{Uniform Resource Locators (URL)}{
+ This specifies the formal syntax and semantics of absolute
+ URLs.}
+ \seerfc{1808}{Relative Uniform Resource Locators}{
+ This Request For Comments includes the rules for joining an
+ absolute and a relative URL, including a fair number of
+ ``Abnormal Examples'' which govern the treatment of border
+ cases.}
+ \seerfc{2396}{Uniform Resource Identifiers (URI): Generic Syntax}{
+ Document describing the generic syntactic requirements for
+ both Uniform Resource Names (URNs) and Uniform Resource
+ Locators (URLs).}
+\end{seealso}
+
+
+\subsection{Results of \function{urlparse()} and \function{urlsplit()}
+ \label{urlparse-result-object}}
+
+The result objects from the \function{urlparse()} and
+\function{urlsplit()} functions are subclasses of the \pytype{tuple}
+type. These subclasses add the attributes described in those
+functions, as well as provide an additional method:
+
+\begin{methoddesc}[ParseResult]{geturl}{}
+ Return the re-combined version of the original URL as a string.
+ This may differ from the original URL in that the scheme will always
+ be normalized to lower case and empty components may be dropped.
+ Specifically, empty parameters, queries, and fragment identifiers
+ will be removed.
+
+ The result of this method is a fixpoint if passed back through the
+ original parsing function:
+
+\begin{verbatim}
+>>> import urlparse
+>>> url = 'HTTP://www.Python.org/doc/#'
+
+>>> r1 = urlparse.urlsplit(url)
+>>> r1.geturl()
+'http://www.Python.org/doc/'
+
+>>> r2 = urlparse.urlsplit(r1.geturl())
+>>> r2.geturl()
+'http://www.Python.org/doc/'
+\end{verbatim}
+
+\versionadded{2.5}
+\end{methoddesc}
+
+The following classes provide the implementations of the parse results::
+
+\begin{classdesc*}{BaseResult}
+ Base class for the concrete result classes. This provides most of
+ the attribute definitions. It does not provide a \method{geturl()}
+ method. It is derived from \class{tuple}, but does not override the
+ \method{__init__()} or \method{__new__()} methods.
+\end{classdesc*}
+
+
+\begin{classdesc}{ParseResult}{scheme, netloc, path, params, query, fragment}
+ Concrete class for \function{urlparse()} results. The
+ \method{__new__()} method is overridden to support checking that the
+ right number of arguments are passed.
+\end{classdesc}
+
+
+\begin{classdesc}{SplitResult}{scheme, netloc, path, query, fragment}
+ Concrete class for \function{urlsplit()} results. The
+ \method{__new__()} method is overridden to support checking that the
+ right number of arguments are passed.
+\end{classdesc}
Added: sandbox/trunk/urilib/test_urllib2.py
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/test_urllib2.py Sun Aug 5 23:42:33 2007
@@ -0,0 +1,1089 @@
+import unittest
+from test import test_support
+
+import os, socket
+import StringIO
+
+import urllib2
+from urllib2 import Request, OpenerDirector
+
+# XXX
+# Request
+# CacheFTPHandler (hard to write)
+# parse_keqv_list, parse_http_list, HTTPDigestAuthHandler
+
+class TrivialTests(unittest.TestCase):
+ def test_trivial(self):
+ # A couple trivial tests
+
+ self.assertRaises(ValueError, urllib2.urlopen, 'bogus url')
+
+ # XXX Name hacking to get this to work on Windows.
+ fname = os.path.abspath(urllib2.__file__).replace('\\', '/')
+ if fname[1:2] == ":":
+ fname = fname[2:]
+ # And more hacking to get it to work on MacOS. This assumes
+ # urllib.pathname2url works, unfortunately...
+ if os.name == 'mac':
+ fname = '/' + fname.replace(':', '/')
+ elif os.name == 'riscos':
+ import string
+ fname = os.expand(fname)
+ fname = fname.translate(string.maketrans("/.", "./"))
+
+ file_url = "file://%s" % fname
+ f = urllib2.urlopen(file_url)
+
+ buf = f.read()
+ f.close()
+
+ def test_parse_http_list(self):
+ tests = [('a,b,c', ['a', 'b', 'c']),
+ ('path"o,l"og"i"cal, example', ['path"o,l"og"i"cal', 'example']),
+ ('a, b, "c", "d", "e,f", g, h', ['a', 'b', '"c"', '"d"', '"e,f"', 'g', 'h']),
+ ('a="b\\"c", d="e\\,f", g="h\\\\i"', ['a="b"c"', 'd="e,f"', 'g="h\\i"'])]
+ for string, list in tests:
+ self.assertEquals(urllib2.parse_http_list(string), list)
+
+
+def test_request_headers_dict():
+ """
+ The Request.headers dictionary is not a documented interface. It should
+ stay that way, because the complete set of headers are only accessible
+ through the .get_header(), .has_header(), .header_items() interface.
+ However, .headers pre-dates those methods, and so real code will be using
+ the dictionary.
+
+ The introduction in 2.4 of those methods was a mistake for the same reason:
+ code that previously saw all (urllib2 user)-provided headers in .headers
+ now sees only a subset (and the function interface is ugly and incomplete).
+ A better change would have been to replace .headers dict with a dict
+ subclass (or UserDict.DictMixin instance?) that preserved the .headers
+ interface and also provided access to the "unredirected" headers. It's
+ probably too late to fix that, though.
+
+
+ Check .capitalize() case normalization:
+
+ >>> url = "http://example.com"
+ >>> Request(url, headers={"Spam-eggs": "blah"}).headers["Spam-eggs"]
+ 'blah'
+ >>> Request(url, headers={"spam-EggS": "blah"}).headers["Spam-eggs"]
+ 'blah'
+
+ Currently, Request(url, "Spam-eggs").headers["Spam-Eggs"] raises KeyError,
+ but that could be changed in future.
+
+ """
+
+def test_request_headers_methods():
+ """
+ Note the case normalization of header names here, to .capitalize()-case.
+ This should be preserved for backwards-compatibility. (In the HTTP case,
+ normalization to .title()-case is done by urllib2 before sending headers to
+ httplib).
+
+ >>> url = "http://example.com"
+ >>> r = Request(url, headers={"Spam-eggs": "blah"})
+ >>> r.has_header("Spam-eggs")
+ True
+ >>> r.header_items()
+ [('Spam-eggs', 'blah')]
+ >>> r.add_header("Foo-Bar", "baz")
+ >>> items = r.header_items()
+ >>> items.sort()
+ >>> items
+ [('Foo-bar', 'baz'), ('Spam-eggs', 'blah')]
+
+ Note that e.g. r.has_header("spam-EggS") is currently False, and
+ r.get_header("spam-EggS") returns None, but that could be changed in
+ future.
+
+ >>> r.has_header("Not-there")
+ False
+ >>> print r.get_header("Not-there")
+ None
+ >>> r.get_header("Not-there", "default")
+ 'default'
+
+ """
+
+
+def test_password_manager(self):
+ """
+ >>> mgr = urllib2.HTTPPasswordMgr()
+ >>> add = mgr.add_password
+ >>> add("Some Realm", "http://example.com/", "joe", "password")
+ >>> add("Some Realm", "http://example.com/ni", "ni", "ni")
+ >>> add("c", "http://example.com/foo", "foo", "ni")
+ >>> add("c", "http://example.com/bar", "bar", "nini")
+ >>> add("b", "http://example.com/", "first", "blah")
+ >>> add("b", "http://example.com/", "second", "spam")
+ >>> add("a", "http://example.com", "1", "a")
+ >>> add("Some Realm", "http://c.example.com:3128", "3", "c")
+ >>> add("Some Realm", "d.example.com", "4", "d")
+ >>> add("Some Realm", "e.example.com:3128", "5", "e")
+
+ >>> mgr.find_user_password("Some Realm", "example.com")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com/")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com/spam")
+ ('joe', 'password')
+ >>> mgr.find_user_password("Some Realm", "http://example.com/spam/spam")
+ ('joe', 'password')
+ >>> mgr.find_user_password("c", "http://example.com/foo")
+ ('foo', 'ni')
+ >>> mgr.find_user_password("c", "http://example.com/bar")
+ ('bar', 'nini')
+
+ Actually, this is really undefined ATM
+## Currently, we use the highest-level path where more than one match:
+
+## >>> mgr.find_user_password("Some Realm", "http://example.com/ni")
+## ('joe', 'password')
+
+ Use latest add_password() in case of conflict:
+
+ >>> mgr.find_user_password("b", "http://example.com/")
+ ('second', 'spam')
+
+ No special relationship between a.example.com and example.com:
+
+ >>> mgr.find_user_password("a", "http://example.com/")
+ ('1', 'a')
+ >>> mgr.find_user_password("a", "http://a.example.com/")
+ (None, None)
+
+ Ports:
+
+ >>> mgr.find_user_password("Some Realm", "c.example.com")
+ (None, None)
+ >>> mgr.find_user_password("Some Realm", "c.example.com:3128")
+ ('3', 'c')
+ >>> mgr.find_user_password("Some Realm", "http://c.example.com:3128")
+ ('3', 'c')
+ >>> mgr.find_user_password("Some Realm", "d.example.com")
+ ('4', 'd')
+ >>> mgr.find_user_password("Some Realm", "e.example.com:3128")
+ ('5', 'e')
+
+ """
+ pass
+
+
+def test_password_manager_default_port(self):
+ """
+ >>> mgr = urllib2.HTTPPasswordMgr()
+ >>> add = mgr.add_password
+
+ The point to note here is that we can't guess the default port if there's
+ no scheme. This applies to both add_password and find_user_password.
+
+ >>> add("f", "http://g.example.com:80", "10", "j")
+ >>> add("g", "http://h.example.com", "11", "k")
+ >>> add("h", "i.example.com:80", "12", "l")
+ >>> add("i", "j.example.com", "13", "m")
+ >>> mgr.find_user_password("f", "g.example.com:100")
+ (None, None)
+ >>> mgr.find_user_password("f", "g.example.com:80")
+ ('10', 'j')
+ >>> mgr.find_user_password("f", "g.example.com")
+ (None, None)
+ >>> mgr.find_user_password("f", "http://g.example.com:100")
+ (None, None)
+ >>> mgr.find_user_password("f", "http://g.example.com:80")
+ ('10', 'j')
+ >>> mgr.find_user_password("f", "http://g.example.com")
+ ('10', 'j')
+ >>> mgr.find_user_password("g", "h.example.com")
+ ('11', 'k')
+ >>> mgr.find_user_password("g", "h.example.com:80")
+ ('11', 'k')
+ >>> mgr.find_user_password("g", "http://h.example.com:80")
+ ('11', 'k')
+ >>> mgr.find_user_password("h", "i.example.com")
+ (None, None)
+ >>> mgr.find_user_password("h", "i.example.com:80")
+ ('12', 'l')
+ >>> mgr.find_user_password("h", "http://i.example.com:80")
+ ('12', 'l')
+ >>> mgr.find_user_password("i", "j.example.com")
+ ('13', 'm')
+ >>> mgr.find_user_password("i", "j.example.com:80")
+ (None, None)
+ >>> mgr.find_user_password("i", "http://j.example.com")
+ ('13', 'm')
+ >>> mgr.find_user_password("i", "http://j.example.com:80")
+ (None, None)
+
+ """
+
+class MockOpener:
+ addheaders = []
+ def open(self, req, data=None):
+ self.req, self.data = req, data
+ def error(self, proto, *args):
+ self.proto, self.args = proto, args
+
+class MockFile:
+ def read(self, count=None): pass
+ def readline(self, count=None): pass
+ def close(self): pass
+
+class MockHeaders(dict):
+ def getheaders(self, name):
+ return self.values()
+
+class MockResponse(StringIO.StringIO):
+ def __init__(self, code, msg, headers, data, url=None):
+ StringIO.StringIO.__init__(self, data)
+ self.code, self.msg, self.headers, self.url = code, msg, headers, url
+ def info(self):
+ return self.headers
+ def geturl(self):
+ return self.url
+
+class MockCookieJar:
+ def add_cookie_header(self, request):
+ self.ach_req = request
+ def extract_cookies(self, response, request):
+ self.ec_req, self.ec_r = request, response
+
+class FakeMethod:
+ def __init__(self, meth_name, action, handle):
+ self.meth_name = meth_name
+ self.handle = handle
+ self.action = action
+ def __call__(self, *args):
+ return self.handle(self.meth_name, self.action, *args)
+
+class MockHandler:
+ # useful for testing handler machinery
+ # see add_ordered_mock_handlers() docstring
+ handler_order = 500
+ def __init__(self, methods):
+ self._define_methods(methods)
+ def _define_methods(self, methods):
+ for spec in methods:
+ if len(spec) == 2: name, action = spec
+ else: name, action = spec, None
+ meth = FakeMethod(name, action, self.handle)
+ setattr(self.__class__, name, meth)
+ def handle(self, fn_name, action, *args, **kwds):
+ self.parent.calls.append((self, fn_name, args, kwds))
+ if action is None:
+ return None
+ elif action == "return self":
+ return self
+ elif action == "return response":
+ res = MockResponse(200, "OK", {}, "")
+ return res
+ elif action == "return request":
+ return Request("http://blah/")
+ elif action.startswith("error"):
+ code = action[action.rfind(" ")+1:]
+ try:
+ code = int(code)
+ except ValueError:
+ pass
+ res = MockResponse(200, "OK", {}, "")
+ return self.parent.error("http", args[0], res, code, "", {})
+ elif action == "raise":
+ raise urllib2.URLError("blah")
+ assert False
+ def close(self): pass
+ def add_parent(self, parent):
+ self.parent = parent
+ self.parent.calls = []
+ def __lt__(self, other):
+ if not hasattr(other, "handler_order"):
+ # No handler_order, leave in original order. Yuck.
+ return True
+ return self.handler_order < other.handler_order
+
+def add_ordered_mock_handlers(opener, meth_spec):
+ """Create MockHandlers and add them to an OpenerDirector.
+
+ meth_spec: list of lists of tuples and strings defining methods to define
+ on handlers. eg:
+
+ [["http_error", "ftp_open"], ["http_open"]]
+
+ defines methods .http_error() and .ftp_open() on one handler, and
+ .http_open() on another. These methods just record their arguments and
+ return None. Using a tuple instead of a string causes the method to
+ perform some action (see MockHandler.handle()), eg:
+
+ [["http_error"], [("http_open", "return request")]]
+
+ defines .http_error() on one handler (which simply returns None), and
+ .http_open() on another handler, which returns a Request object.
+
+ """
+ handlers = []
+ count = 0
+ for meths in meth_spec:
+ class MockHandlerSubclass(MockHandler): pass
+ h = MockHandlerSubclass(meths)
+ h.handler_order += count
+ h.add_parent(opener)
+ count = count + 1
+ handlers.append(h)
+ opener.add_handler(h)
+ return handlers
+
+def build_test_opener(*handler_instances):
+ opener = OpenerDirector()
+ for h in handler_instances:
+ opener.add_handler(h)
+ return opener
+
+class MockHTTPHandler(urllib2.BaseHandler):
+ # useful for testing redirections and auth
+ # sends supplied headers and code as first response
+ # sends 200 OK as second response
+ def __init__(self, code, headers):
+ self.code = code
+ self.headers = headers
+ self.reset()
+ def reset(self):
+ self._count = 0
+ self.requests = []
+ def http_open(self, req):
+ import mimetools, httplib, copy
+ from StringIO import StringIO
+ self.requests.append(copy.deepcopy(req))
+ if self._count == 0:
+ self._count = self._count + 1
+ name = httplib.responses[self.code]
+ msg = mimetools.Message(StringIO(self.headers))
+ return self.parent.error(
+ "http", req, MockFile(), self.code, name, msg)
+ else:
+ self.req = req
+ msg = mimetools.Message(StringIO("\r\n\r\n"))
+ return MockResponse(200, "OK", msg, "", req.get_full_url())
+
+class MockPasswordManager:
+ def add_password(self, realm, uri, user, password):
+ self.realm = realm
+ self.url = uri
+ self.user = user
+ self.password = password
+ def find_user_password(self, realm, authuri):
+ self.target_realm = realm
+ self.target_url = authuri
+ return self.user, self.password
+
+
+class OpenerDirectorTests(unittest.TestCase):
+
+ def test_add_non_handler(self):
+ class NonHandler(object):
+ pass
+ self.assertRaises(TypeError,
+ OpenerDirector().add_handler, NonHandler())
+
+ def test_badly_named_methods(self):
+ # test work-around for three methods that accidentally follow the
+ # naming conventions for handler methods
+ # (*_open() / *_request() / *_response())
+
+ # These used to call the accidentally-named methods, causing a
+ # TypeError in real code; here, returning self from these mock
+ # methods would either cause no exception, or AttributeError.
+
+ from urllib2 import URLError
+
+ o = OpenerDirector()
+ meth_spec = [
+ [("do_open", "return self"), ("proxy_open", "return self")],
+ [("redirect_request", "return self")],
+ ]
+ handlers = add_ordered_mock_handlers(o, meth_spec)
+ o.add_handler(urllib2.UnknownHandler())
+ for scheme in "do", "proxy", "redirect":
+ self.assertRaises(URLError, o.open, scheme+"://example.com/")
+
+ def test_handled(self):
+ # handler returning non-None means no more handlers will be called
+ o = OpenerDirector()
+ meth_spec = [
+ ["http_open", "ftp_open", "http_error_302"],
+ ["ftp_open"],
+ [("http_open", "return self")],
+ [("http_open", "return self")],
+ ]
+ handlers = add_ordered_mock_handlers(o, meth_spec)
+
+ req = Request("http://example.com/")
+ r = o.open(req)
+ # Second .http_open() gets called, third doesn't, since second returned
+ # non-None. Handlers without .http_open() never get any methods called
+ # on them.
+ # In fact, second mock handler defining .http_open() returns self
+ # (instead of response), which becomes the OpenerDirector's return
+ # value.
+ self.assertEqual(r, handlers[2])
+ calls = [(handlers[0], "http_open"), (handlers[2], "http_open")]
+ for expected, got in zip(calls, o.calls):
+ handler, name, args, kwds = got
+ self.assertEqual((handler, name), expected)
+ self.assertEqual(args, (req,))
+
+ def test_handler_order(self):
+ o = OpenerDirector()
+ handlers = []
+ for meths, handler_order in [
+ ([("http_open", "return self")], 500),
+ (["http_open"], 0),
+ ]:
+ class MockHandlerSubclass(MockHandler): pass
+ h = MockHandlerSubclass(meths)
+ h.handler_order = handler_order
+ handlers.append(h)
+ o.add_handler(h)
+
+ r = o.open("http://example.com/")
+ # handlers called in reverse order, thanks to their sort order
+ self.assertEqual(o.calls[0][0], handlers[1])
+ self.assertEqual(o.calls[1][0], handlers[0])
+
+ def test_raise(self):
+ # raising URLError stops processing of request
+ o = OpenerDirector()
+ meth_spec = [
+ [("http_open", "raise")],
+ [("http_open", "return self")],
+ ]
+ handlers = add_ordered_mock_handlers(o, meth_spec)
+
+ req = Request("http://example.com/")
+ self.assertRaises(urllib2.URLError, o.open, req)
+ self.assertEqual(o.calls, [(handlers[0], "http_open", (req,), {})])
+
+## def test_error(self):
+## # XXX this doesn't actually seem to be used in standard library,
+## # but should really be tested anyway...
+
+ def test_http_error(self):
+ # XXX http_error_default
+ # http errors are a special case
+ o = OpenerDirector()
+ meth_spec = [
+ [("http_open", "error 302")],
+ [("http_error_400", "raise"), "http_open"],
+ [("http_error_302", "return response"), "http_error_303",
+ "http_error"],
+ [("http_error_302")],
+ ]
+ handlers = add_ordered_mock_handlers(o, meth_spec)
+
+ class Unknown:
+ def __eq__(self, other): return True
+
+ req = Request("http://example.com/")
+ r = o.open(req)
+ assert len(o.calls) == 2
+ calls = [(handlers[0], "http_open", (req,)),
+ (handlers[2], "http_error_302",
+ (req, Unknown(), 302, "", {}))]
+ for expected, got in zip(calls, o.calls):
+ handler, method_name, args = expected
+ self.assertEqual((handler, method_name), got[:2])
+ self.assertEqual(args, got[2])
+
+ def test_processors(self):
+ # *_request / *_response methods get called appropriately
+ o = OpenerDirector()
+ meth_spec = [
+ [("http_request", "return request"),
+ ("http_response", "return response")],
+ [("http_request", "return request"),
+ ("http_response", "return response")],
+ ]
+ handlers = add_ordered_mock_handlers(o, meth_spec)
+
+ req = Request("http://example.com/")
+ r = o.open(req)
+ # processor methods are called on *all* handlers that define them,
+ # not just the first handler that handles the request
+ calls = [
+ (handlers[0], "http_request"), (handlers[1], "http_request"),
+ (handlers[0], "http_response"), (handlers[1], "http_response")]
+
+ for i, (handler, name, args, kwds) in enumerate(o.calls):
+ if i < 2:
+ # *_request
+ self.assertEqual((handler, name), calls[i])
+ self.assertEqual(len(args), 1)
+ self.assert_(isinstance(args[0], Request))
+ else:
+ # *_response
+ self.assertEqual((handler, name), calls[i])
+ self.assertEqual(len(args), 2)
+ self.assert_(isinstance(args[0], Request))
+ # response from opener.open is None, because there's no
+ # handler that defines http_open to handle it
+ self.assert_(args[1] is None or
+ isinstance(args[1], MockResponse))
+
+
+def sanepathname2url(path):
+ import urllib
+ urlpath = urllib.pathname2url(path)
+ if os.name == "nt" and urlpath.startswith("///"):
+ urlpath = urlpath[2:]
+ # XXX don't ask me about the mac...
+ return urlpath
+
+class HandlerTests(unittest.TestCase):
+
+ def test_ftp(self):
+ class MockFTPWrapper:
+ def __init__(self, data): self.data = data
+ def retrfile(self, filename, filetype):
+ self.filename, self.filetype = filename, filetype
+ return StringIO.StringIO(self.data), len(self.data)
+
+ class NullFTPHandler(urllib2.FTPHandler):
+ def __init__(self, data): self.data = data
+ def connect_ftp(self, user, passwd, host, port, dirs, timeout=None):
+ self.user, self.passwd = user, passwd
+ self.host, self.port = host, port
+ self.dirs = dirs
+ self.ftpwrapper = MockFTPWrapper(self.data)
+ return self.ftpwrapper
+
+ import ftplib, socket
+ data = "rheum rhaponicum"
+ h = NullFTPHandler(data)
+ o = h.parent = MockOpener()
+
+ for url, host, port, type_, dirs, filename, mimetype in [
+ ("ftp://localhost/foo/bar/baz.html",
+ "localhost", ftplib.FTP_PORT, "I",
+ ["foo", "bar"], "baz.html", "text/html"),
+ ("ftp://localhost:80/foo/bar/",
+ "localhost", 80, "D",
+ ["foo", "bar"], "", None),
+ ("ftp://localhost/baz.gif;type=a",
+ "localhost", ftplib.FTP_PORT, "A",
+ [], "baz.gif", None), # XXX really this should guess image/gif
+ ]:
+ req = Request(url)
+ req.timeout = None
+ r = h.ftp_open(req)
+ # ftp authentication not yet implemented by FTPHandler
+ self.assert_(h.user == h.passwd == "")
+ self.assertEqual(h.host, socket.gethostbyname(host))
+ self.assertEqual(h.port, port)
+ self.assertEqual(h.dirs, dirs)
+ self.assertEqual(h.ftpwrapper.filename, filename)
+ self.assertEqual(h.ftpwrapper.filetype, type_)
+ headers = r.info()
+ self.assertEqual(headers.get("Content-type"), mimetype)
+ self.assertEqual(int(headers["Content-length"]), len(data))
+
+ def test_file(self):
+ import time, rfc822, socket
+ h = urllib2.FileHandler()
+ o = h.parent = MockOpener()
+
+ TESTFN = test_support.TESTFN
+ urlpath = sanepathname2url(os.path.abspath(TESTFN))
+ towrite = "hello, world\n"
+ urls = [
+ "file://localhost%s" % urlpath,
+ "file://%s" % urlpath,
+ "file://%s%s" % (socket.gethostbyname('localhost'), urlpath),
+ ]
+ try:
+ localaddr = socket.gethostbyname(socket.gethostname())
+ except socket.gaierror:
+ localaddr = ''
+ if localaddr:
+ urls.append("file://%s%s" % (localaddr, urlpath))
+
+ for url in urls:
+ f = open(TESTFN, "wb")
+ try:
+ try:
+ f.write(towrite)
+ finally:
+ f.close()
+
+ r = h.file_open(Request(url))
+ try:
+ data = r.read()
+ headers = r.info()
+ newurl = r.geturl()
+ finally:
+ r.close()
+ stats = os.stat(TESTFN)
+ modified = rfc822.formatdate(stats.st_mtime)
+ finally:
+ os.remove(TESTFN)
+ self.assertEqual(data, towrite)
+ self.assertEqual(headers["Content-type"], "text/plain")
+ self.assertEqual(headers["Content-length"], "13")
+ self.assertEqual(headers["Last-modified"], modified)
+
+ for url in [
+ "file://localhost:80%s" % urlpath,
+ "file:///file_does_not_exist.txt",
+ "file://%s:80%s/%s" % (socket.gethostbyname('localhost'),
+ os.getcwd(), TESTFN),
+ "file://somerandomhost.ontheinternet.com%s/%s" %
+ (os.getcwd(), TESTFN),
+ ]:
+ try:
+ f = open(TESTFN, "wb")
+ try:
+ f.write(towrite)
+ finally:
+ f.close()
+
+ self.assertRaises(urllib2.URLError,
+ h.file_open, Request(url))
+ finally:
+ os.remove(TESTFN)
+
+ h = urllib2.FileHandler()
+ o = h.parent = MockOpener()
+ # XXXX why does // mean ftp (and /// mean not ftp!), and where
+ # is file: scheme specified? I think this is really a bug, and
+ # what was intended was to distinguish between URLs like:
+ # file:/blah.txt (a file)
+ # file://localhost/blah.txt (a file)
+ # file:///blah.txt (a file)
+ # file://ftp.example.com/blah.txt (an ftp URL)
+ for url, ftp in [
+ ("file://ftp.example.com//foo.txt", True),
+ ("file://ftp.example.com///foo.txt", False),
+# XXXX bug: fails with OSError, should be URLError
+ ("file://ftp.example.com/foo.txt", False),
+ ]:
+ req = Request(url)
+ try:
+ h.file_open(req)
+ # XXXX remove OSError when bug fixed
+ except (urllib2.URLError, OSError):
+ self.assert_(not ftp)
+ else:
+ self.assert_(o.req is req)
+ self.assertEqual(req.type, "ftp")
+
+ def test_http(self):
+ class MockHTTPResponse:
+ def __init__(self, fp, msg, status, reason):
+ self.fp = fp
+ self.msg = msg
+ self.status = status
+ self.reason = reason
+ def read(self):
+ return ''
+ class MockHTTPClass:
+ def __init__(self):
+ self.req_headers = []
+ self.data = None
+ self.raise_on_endheaders = False
+ def __call__(self, host, timeout=None):
+ self.host = host
+ self.timeout = timeout
+ return self
+ def set_debuglevel(self, level):
+ self.level = level
+ def request(self, method, url, body=None, headers={}):
+ self.method = method
+ self.selector = url
+ self.req_headers += headers.items()
+ self.req_headers.sort()
+ if body:
+ self.data = body
+ if self.raise_on_endheaders:
+ import socket
+ raise socket.error()
+ def getresponse(self):
+ return MockHTTPResponse(MockFile(), {}, 200, "OK")
+
+ h = urllib2.AbstractHTTPHandler()
+ o = h.parent = MockOpener()
+
+ url = "http://example.com/"
+ for method, data in [("GET", None), ("POST", "blah")]:
+ req = Request(url, data, {"Foo": "bar"})
+ req.timeout = None
+ req.add_unredirected_header("Spam", "eggs")
+ http = MockHTTPClass()
+ r = h.do_open(http, req)
+
+ # result attributes
+ r.read; r.readline # wrapped MockFile methods
+ r.info; r.geturl # addinfourl methods
+ r.code, r.msg == 200, "OK" # added from MockHTTPClass.getreply()
+ hdrs = r.info()
+ hdrs.get; hdrs.has_key # r.info() gives dict from .getreply()
+ self.assertEqual(r.geturl(), url)
+
+ self.assertEqual(http.host, "example.com")
+ self.assertEqual(http.level, 0)
+ self.assertEqual(http.method, method)
+ self.assertEqual(http.selector, "/")
+ self.assertEqual(http.req_headers,
+ [("Connection", "close"),
+ ("Foo", "bar"), ("Spam", "eggs")])
+ self.assertEqual(http.data, data)
+
+ # check socket.error converted to URLError
+ http.raise_on_endheaders = True
+ self.assertRaises(urllib2.URLError, h.do_open, http, req)
+
+ # check adding of standard headers
+ o.addheaders = [("Spam", "eggs")]
+ for data in "", None: # POST, GET
+ req = Request("http://example.com/", data)
+ r = MockResponse(200, "OK", {}, "")
+ newreq = h.do_request_(req)
+ if data is None: # GET
+ self.assert_("Content-length" not in req.unredirected_hdrs)
+ self.assert_("Content-type" not in req.unredirected_hdrs)
+ else: # POST
+ self.assertEqual(req.unredirected_hdrs["Content-length"], "0")
+ self.assertEqual(req.unredirected_hdrs["Content-type"],
+ "application/x-www-form-urlencoded")
+ # XXX the details of Host could be better tested
+ self.assertEqual(req.unredirected_hdrs["Host"], "example.com")
+ self.assertEqual(req.unredirected_hdrs["Spam"], "eggs")
+
+ # don't clobber existing headers
+ req.add_unredirected_header("Content-length", "foo")
+ req.add_unredirected_header("Content-type", "bar")
+ req.add_unredirected_header("Host", "baz")
+ req.add_unredirected_header("Spam", "foo")
+ newreq = h.do_request_(req)
+ self.assertEqual(req.unredirected_hdrs["Content-length"], "foo")
+ self.assertEqual(req.unredirected_hdrs["Content-type"], "bar")
+ self.assertEqual(req.unredirected_hdrs["Host"], "baz")
+ self.assertEqual(req.unredirected_hdrs["Spam"], "foo")
+
+ def test_errors(self):
+ h = urllib2.HTTPErrorProcessor()
+ o = h.parent = MockOpener()
+
+ url = "http://example.com/"
+ req = Request(url)
+ # all 2xx are passed through
+ r = MockResponse(200, "OK", {}, "", url)
+ newr = h.http_response(req, r)
+ self.assert_(r is newr)
+ self.assert_(not hasattr(o, "proto")) # o.error not called
+ r = MockResponse(202, "Accepted", {}, "", url)
+ newr = h.http_response(req, r)
+ self.assert_(r is newr)
+ self.assert_(not hasattr(o, "proto")) # o.error not called
+ r = MockResponse(206, "Partial content", {}, "", url)
+ newr = h.http_response(req, r)
+ self.assert_(r is newr)
+ self.assert_(not hasattr(o, "proto")) # o.error not called
+ # anything else calls o.error (and MockOpener returns None, here)
+ r = MockResponse(502, "Bad gateway", {}, "", url)
+ self.assert_(h.http_response(req, r) is None)
+ self.assertEqual(o.proto, "http") # o.error called
+ self.assertEqual(o.args, (req, r, 502, "Bad gateway", {}))
+
+ def test_cookies(self):
+ cj = MockCookieJar()
+ h = urllib2.HTTPCookieProcessor(cj)
+ o = h.parent = MockOpener()
+
+ req = Request("http://example.com/")
+ r = MockResponse(200, "OK", {}, "")
+ newreq = h.http_request(req)
+ self.assert_(cj.ach_req is req is newreq)
+ self.assertEquals(req.get_origin_req_host(), "example.com")
+ self.assert_(not req.is_unverifiable())
+ newr = h.http_response(req, r)
+ self.assert_(cj.ec_req is req)
+ self.assert_(cj.ec_r is r is newr)
+
+ def test_redirect(self):
+ from_url = "http://example.com/a.html"
+ to_url = "http://example.com/b.html"
+ h = urllib2.HTTPRedirectHandler()
+ o = h.parent = MockOpener()
+
+ # ordinary redirect behaviour
+ for code in 301, 302, 303, 307:
+ for data in None, "blah\nblah\n":
+ method = getattr(h, "http_error_%s" % code)
+ req = Request(from_url, data)
+ req.add_header("Nonsense", "viking=withhold")
+ req.add_unredirected_header("Spam", "spam")
+ try:
+ method(req, MockFile(), code, "Blah",
+ MockHeaders({"location": to_url}))
+ except urllib2.HTTPError:
+ # 307 in response to POST requires user OK
+ self.assert_(code == 307 and data is not None)
+ self.assertEqual(o.req.get_full_url(), to_url)
+ try:
+ self.assertEqual(o.req.get_method(), "GET")
+ except AttributeError:
+ self.assert_(not o.req.has_data())
+ self.assertEqual(o.req.headers["Nonsense"],
+ "viking=withhold")
+ self.assert_("Spam" not in o.req.headers)
+ self.assert_("Spam" not in o.req.unredirected_hdrs)
+
+ # loop detection
+ req = Request(from_url)
+ def redirect(h, req, url=to_url):
+ h.http_error_302(req, MockFile(), 302, "Blah",
+ MockHeaders({"location": url}))
+ # Note that the *original* request shares the same record of
+ # redirections with the sub-requests caused by the redirections.
+
+ # detect infinite loop redirect of a URL to itself
+ req = Request(from_url, origin_req_host="example.com")
+ count = 0
+ try:
+ while 1:
+ redirect(h, req, "http://example.com/")
+ count = count + 1
+ except urllib2.HTTPError:
+ # don't stop until max_repeats, because cookies may introduce state
+ self.assertEqual(count, urllib2.HTTPRedirectHandler.max_repeats)
+
+ # detect endless non-repeating chain of redirects
+ req = Request(from_url, origin_req_host="example.com")
+ count = 0
+ try:
+ while 1:
+ redirect(h, req, "http://example.com/%d" % count)
+ count = count + 1
+ except urllib2.HTTPError:
+ self.assertEqual(count,
+ urllib2.HTTPRedirectHandler.max_redirections)
+ # test cached redirection implemented for 301 redirection
+ def cached_redirect(h, req, url=to_url):
+ h.http_error_301(req, MockFile(), 301, "Blah",
+ MockHeaders({"location":url}))
+ req = Request(from_url, origin_req_host="example.com")
+ count = 0
+ try:
+ while 1:
+ cached_redirect(h, req, "http://example.com/")
+ count = count + 1
+ if count > 1:
+ # Check for presence of a cached dictionary.
+ # Content not checked as tests returns None.
+ # Independent redirection tests for 301 handled,however.
+ self.assert_(h.cache)
+ except urllib2.HTTPError:
+ self.assertEqual(count, urllib2.HTTPRedirectHandler.max_repeats)
+
+ def test_cookie_redirect(self):
+ # cookies shouldn't leak into redirected requests
+ from cookielib import CookieJar
+
+ from test.test_cookielib import interact_netscape
+
+ cj = CookieJar()
+ interact_netscape(cj, "http://www.example.com/", "spam=eggs")
+ hh = MockHTTPHandler(302, "Location: http://www.cracker.com/\r\n\r\n")
+ hdeh = urllib2.HTTPDefaultErrorHandler()
+ hrh = urllib2.HTTPRedirectHandler()
+ cp = urllib2.HTTPCookieProcessor(cj)
+ o = build_test_opener(hh, hdeh, hrh, cp)
+ o.open("http://www.example.com/")
+ self.assert_(not hh.req.has_header("Cookie"))
+
+ def test_proxy(self):
+ o = OpenerDirector()
+ ph = urllib2.ProxyHandler(dict(http="proxy.example.com:3128"))
+ o.add_handler(ph)
+ meth_spec = [
+ [("http_open", "return response")]
+ ]
+ handlers = add_ordered_mock_handlers(o, meth_spec)
+
+ req = Request("http://acme.example.com/")
+ self.assertEqual(req.get_host(), "acme.example.com")
+ r = o.open(req)
+ self.assertEqual(req.get_host(), "proxy.example.com:3128")
+
+ self.assertEqual([(handlers[0], "http_open")],
+ [tup[0:2] for tup in o.calls])
+
+ def test_basic_auth(self):
+ opener = OpenerDirector()
+ password_manager = MockPasswordManager()
+ auth_handler = urllib2.HTTPBasicAuthHandler(password_manager)
+ realm = "ACME Widget Store"
+ http_handler = MockHTTPHandler(
+ 401, 'WWW-Authenticate: Basic realm="%s"\r\n\r\n' % realm)
+ opener.add_handler(auth_handler)
+ opener.add_handler(http_handler)
+ self._test_basic_auth(opener, auth_handler, "Authorization",
+ realm, http_handler, password_manager,
+ "http://acme.example.com/protected",
+ "http://acme.example.com/protected",
+ )
+
+ def test_proxy_basic_auth(self):
+ opener = OpenerDirector()
+ ph = urllib2.ProxyHandler(dict(http="proxy.example.com:3128"))
+ opener.add_handler(ph)
+ password_manager = MockPasswordManager()
+ auth_handler = urllib2.ProxyBasicAuthHandler(password_manager)
+ realm = "ACME Networks"
+ http_handler = MockHTTPHandler(
+ 407, 'Proxy-Authenticate: Basic realm="%s"\r\n\r\n' % realm)
+ opener.add_handler(auth_handler)
+ opener.add_handler(http_handler)
+ self._test_basic_auth(opener, auth_handler, "Proxy-authorization",
+ realm, http_handler, password_manager,
+ "http://acme.example.com:3128/protected",
+ "proxy.example.com:3128",
+ )
+
+ def test_basic_and_digest_auth_handlers(self):
+ # HTTPDigestAuthHandler threw an exception if it couldn't handle a 40*
+ # response (http://python.org/sf/1479302), where it should instead
+ # return None to allow another handler (especially
+ # HTTPBasicAuthHandler) to handle the response.
+
+ # Also (http://python.org/sf/14797027, RFC 2617 section 1.2), we must
+ # try digest first (since it's the strongest auth scheme), so we record
+ # order of calls here to check digest comes first:
+ class RecordingOpenerDirector(OpenerDirector):
+ def __init__(self):
+ OpenerDirector.__init__(self)
+ self.recorded = []
+ def record(self, info):
+ self.recorded.append(info)
+ class TestDigestAuthHandler(urllib2.HTTPDigestAuthHandler):
+ def http_error_401(self, *args, **kwds):
+ self.parent.record("digest")
+ urllib2.HTTPDigestAuthHandler.http_error_401(self,
+ *args, **kwds)
+ class TestBasicAuthHandler(urllib2.HTTPBasicAuthHandler):
+ def http_error_401(self, *args, **kwds):
+ self.parent.record("basic")
+ urllib2.HTTPBasicAuthHandler.http_error_401(self,
+ *args, **kwds)
+
+ opener = RecordingOpenerDirector()
+ password_manager = MockPasswordManager()
+ digest_handler = TestDigestAuthHandler(password_manager)
+ basic_handler = TestBasicAuthHandler(password_manager)
+ realm = "ACME Networks"
+ http_handler = MockHTTPHandler(
+ 401, 'WWW-Authenticate: Basic realm="%s"\r\n\r\n' % realm)
+ opener.add_handler(basic_handler)
+ opener.add_handler(digest_handler)
+ opener.add_handler(http_handler)
+
+ # check basic auth isn't blocked by digest handler failing
+ self._test_basic_auth(opener, basic_handler, "Authorization",
+ realm, http_handler, password_manager,
+ "http://acme.example.com/protected",
+ "http://acme.example.com/protected",
+ )
+ # check digest was tried before basic (twice, because
+ # _test_basic_auth called .open() twice)
+ self.assertEqual(opener.recorded, ["digest", "basic"]*2)
+
+ def _test_basic_auth(self, opener, auth_handler, auth_header,
+ realm, http_handler, password_manager,
+ request_url, protected_url):
+ import base64, httplib
+ user, password = "wile", "coyote"
+
+ # .add_password() fed through to password manager
+ auth_handler.add_password(realm, request_url, user, password)
+ self.assertEqual(realm, password_manager.realm)
+ self.assertEqual(request_url, password_manager.url)
+ self.assertEqual(user, password_manager.user)
+ self.assertEqual(password, password_manager.password)
+
+ r = opener.open(request_url)
+
+ # should have asked the password manager for the username/password
+ self.assertEqual(password_manager.target_realm, realm)
+ self.assertEqual(password_manager.target_url, protected_url)
+
+ # expect one request without authorization, then one with
+ self.assertEqual(len(http_handler.requests), 2)
+ self.assertFalse(http_handler.requests[0].has_header(auth_header))
+ userpass = '%s:%s' % (user, password)
+ auth_hdr_value = 'Basic '+base64.encodestring(userpass).strip()
+ self.assertEqual(http_handler.requests[1].get_header(auth_header),
+ auth_hdr_value)
+
+ # if the password manager can't find a password, the handler won't
+ # handle the HTTP auth error
+ password_manager.user = password_manager.password = None
+ http_handler.reset()
+ r = opener.open(request_url)
+ self.assertEqual(len(http_handler.requests), 1)
+ self.assertFalse(http_handler.requests[0].has_header(auth_header))
+
+
+class MiscTests(unittest.TestCase):
+
+ def test_build_opener(self):
+ class MyHTTPHandler(urllib2.HTTPHandler): pass
+ class FooHandler(urllib2.BaseHandler):
+ def foo_open(self): pass
+ class BarHandler(urllib2.BaseHandler):
+ def bar_open(self): pass
+
+ build_opener = urllib2.build_opener
+
+ o = build_opener(FooHandler, BarHandler)
+ self.opener_has_handler(o, FooHandler)
+ self.opener_has_handler(o, BarHandler)
+
+ # can take a mix of classes and instances
+ o = build_opener(FooHandler, BarHandler())
+ self.opener_has_handler(o, FooHandler)
+ self.opener_has_handler(o, BarHandler)
+
+ # subclasses of default handlers override default handlers
+ o = build_opener(MyHTTPHandler)
+ self.opener_has_handler(o, MyHTTPHandler)
+
+ # a particular case of overriding: default handlers can be passed
+ # in explicitly
+ o = build_opener()
+ self.opener_has_handler(o, urllib2.HTTPHandler)
+ o = build_opener(urllib2.HTTPHandler)
+ self.opener_has_handler(o, urllib2.HTTPHandler)
+ o = build_opener(urllib2.HTTPHandler())
+ self.opener_has_handler(o, urllib2.HTTPHandler)
+
+ def opener_has_handler(self, opener, handler_class):
+ for h in opener.handlers:
+ if h.__class__ == handler_class:
+ break
+ else:
+ self.assert_(False)
+
+
+def test_main(verbose=None):
+ from test import test_urllib2
+ test_support.run_doctest(test_urllib2, verbose)
+ test_support.run_doctest(urllib2, verbose)
+ tests = (TrivialTests,
+ OpenerDirectorTests,
+ HandlerTests,
+ MiscTests)
+ test_support.run_unittest(*tests)
+
+if __name__ == "__main__":
+ test_main(verbose=True)
Added: sandbox/trunk/urilib/urllib2.py
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/urllib2.py Sun Aug 5 23:42:33 2007
@@ -0,0 +1,1371 @@
+"""An extensible library for opening URLs using a variety of protocols
+
+The simplest way to use this module is to call the urlopen function,
+which accepts a string containing a URL or a Request object (described
+below). It opens the URL and returns the results as file-like
+object; the returned object has some extra methods described below.
+
+The OpenerDirector manages a collection of Handler objects that do
+all the actual work. Each Handler implements a particular protocol or
+option. The OpenerDirector is a composite object that invokes the
+Handlers needed to open the requested URL. For example, the
+HTTPHandler performs HTTP GET and POST requests and deals with
+non-error returns. The HTTPRedirectHandler automatically deals with
+HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
+deals with digest authentication.
+
+urlopen(url, data=None) -- Basic usage is the same as original
+urllib. pass the url and optionally data to post to an HTTP URL, and
+get a file-like object back. One difference is that you can also pass
+a Request instance instead of URL. Raises a URLError (subclass of
+IOError); for HTTP errors, raises an HTTPError, which can also be
+treated as a valid response.
+
+build_opener -- Function that creates a new OpenerDirector instance.
+Will install the default handlers. Accepts one or more Handlers as
+arguments, either instances or Handler classes that it will
+instantiate. If one of the argument is a subclass of the default
+handler, the argument will be installed instead of the default.
+
+install_opener -- Installs a new opener as the default opener.
+
+objects of interest:
+OpenerDirector --
+
+Request -- An object that encapsulates the state of a request. The
+state can be as simple as the URL. It can also include extra HTTP
+headers, e.g. a User-Agent.
+
+BaseHandler --
+
+exceptions:
+URLError -- A subclass of IOError, individual protocols have their own
+specific subclass.
+
+HTTPError -- Also a valid HTTP response, so you can treat an HTTP error
+as an exceptional event or valid response.
+
+internals:
+BaseHandler and parent
+_call_chain conventions
+
+Example usage:
+
+import urllib2
+
+# set up authentication info
+authinfo = urllib2.HTTPBasicAuthHandler()
+authinfo.add_password(realm='PDQ Application',
+ uri='https://mahler:8092/site-updates.py',
+ user='klem',
+ passwd='geheim$parole')
+
+proxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})
+
+# build a new opener that adds authentication and caching FTP handlers
+opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
+
+# install it
+urllib2.install_opener(opener)
+
+f = urllib2.urlopen('http://www.python.org/')
+
+
+"""
+
+# XXX issues:
+# If an authentication error handler that tries to perform
+# authentication for some reason but fails, how should the error be
+# signalled? The client needs to know the HTTP error code. But if
+# the handler knows that the problem was, e.g., that it didn't know
+# that hash algo that requested in the challenge, it would be good to
+# pass that information along to the client, too.
+# ftp errors aren't handled cleanly
+# check digest against correct (i.e. non-apache) implementation
+
+# Possible extensions:
+# complex proxies XXX not sure what exactly was meant by this
+# abstract factory for opener
+
+import base64
+import hashlib
+import httplib
+import mimetools
+import os
+import posixpath
+import random
+import re
+import socket
+import sys
+import time
+import urlparse
+import bisect
+
+try:
+ from cStringIO import StringIO
+except ImportError:
+ from StringIO import StringIO
+
+from urllib import (unwrap, unquote, splittype, splithost, quote,
+ addinfourl, splitport, splitquery,
+ splitattr, ftpwrapper, noheaders, splituser, splitpasswd, splitvalue)
+
+# support for FileHandler, proxies via environment variables
+from urllib import localhost, url2pathname, getproxies
+
+# used in User-Agent header sent
+__version__ = sys.version[:3]
+
+_opener = None
+def urlopen(url, data=None, timeout=None):
+ global _opener
+ if _opener is None:
+ _opener = build_opener()
+ return _opener.open(url, data, timeout)
+
+def install_opener(opener):
+ global _opener
+ _opener = opener
+
+# do these error classes make sense?
+# make sure all of the IOError stuff is overridden. we just want to be
+# subtypes.
+
+class URLError(IOError):
+ # URLError is a sub-type of IOError, but it doesn't share any of
+ # the implementation. need to override __init__ and __str__.
+ # It sets self.args for compatibility with other EnvironmentError
+ # subclasses, but args doesn't have the typical format with errno in
+ # slot 0 and strerror in slot 1. This may be better than nothing.
+ def __init__(self, reason):
+ self.args = reason,
+ self.reason = reason
+
+ def __str__(self):
+ return '<urlopen error %s>' % self.reason
+
+class HTTPError(URLError, addinfourl):
+ """Raised when HTTP error occurs, but also acts like non-error return"""
+ __super_init = addinfourl.__init__
+
+ def __init__(self, url, code, msg, hdrs, fp):
+ self.code = code
+ self.msg = msg
+ self.hdrs = hdrs
+ self.fp = fp
+ self.filename = url
+ # The addinfourl classes depend on fp being a valid file
+ # object. In some cases, the HTTPError may not have a valid
+ # file object. If this happens, the simplest workaround is to
+ # not initialize the base classes.
+ if fp is not None:
+ self.__super_init(fp, hdrs, url)
+
+ def __str__(self):
+ return 'HTTP Error %s: %s' % (self.code, self.msg)
+
+# copied from cookielib.py
+_cut_port_re = re.compile(r":\d+$")
+def request_host(request):
+ """Return request-host, as defined by RFC 2965.
+
+ Variation from RFC: returned value is lowercased, for convenient
+ comparison.
+
+ """
+ url = request.get_full_url()
+ host = urlparse.urlparse(url)[1]
+ if host == "":
+ host = request.get_header("Host", "")
+
+ # remove port, if present
+ host = _cut_port_re.sub("", host, 1)
+ return host.lower()
+
+class Request:
+
+ def __init__(self, url, data=None, headers={},
+ origin_req_host=None, unverifiable=False):
+ # unwrap('<URL:type://host/path>') --> 'type://host/path'
+ self.__original = unwrap(url)
+ self.type = None
+ # self.__r_type is what's left after doing the splittype
+ self.host = None
+ self.port = None
+ self.data = data
+ self.headers = {}
+ for key, value in headers.items():
+ self.add_header(key, value)
+ self.unredirected_hdrs = {}
+ if origin_req_host is None:
+ origin_req_host = request_host(self)
+ self.origin_req_host = origin_req_host
+ self.unverifiable = unverifiable
+
+ def __getattr__(self, attr):
+ # XXX this is a fallback mechanism to guard against these
+ # methods getting called in a non-standard order. this may be
+ # too complicated and/or unnecessary.
+ # XXX should the __r_XXX attributes be public?
+ if attr[:12] == '_Request__r_':
+ name = attr[12:]
+ if hasattr(Request, 'get_' + name):
+ getattr(self, 'get_' + name)()
+ return getattr(self, attr)
+ raise AttributeError, attr
+
+ def get_method(self):
+ if self.has_data():
+ return "POST"
+ else:
+ return "GET"
+
+ # XXX these helper methods are lame
+
+ def add_data(self, data):
+ self.data = data
+
+ def has_data(self):
+ return self.data is not None
+
+ def get_data(self):
+ return self.data
+
+ def get_full_url(self):
+ return self.__original
+
+ def get_type(self):
+ if self.type is None:
+ self.type, self.__r_type = splittype(self.__original)
+ if self.type is None:
+ raise ValueError, "unknown url type: %s" % self.__original
+ return self.type
+
+ def get_host(self):
+ if self.host is None:
+ self.host, self.__r_host = splithost(self.__r_type)
+ if self.host:
+ self.host = unquote(self.host)
+ return self.host
+
+ def get_selector(self):
+ return self.__r_host
+
+ def set_proxy(self, host, type):
+ self.host, self.type = host, type
+ self.__r_host = self.__original
+
+ def get_origin_req_host(self):
+ return self.origin_req_host
+
+ def is_unverifiable(self):
+ return self.unverifiable
+
+ def add_header(self, key, val):
+ # useful for something like authentication
+ self.headers[key.capitalize()] = val
+
+ def add_unredirected_header(self, key, val):
+ # will not be added to a redirected request
+ self.unredirected_hdrs[key.capitalize()] = val
+
+ def has_header(self, header_name):
+ return (header_name in self.headers or
+ header_name in self.unredirected_hdrs)
+
+ def get_header(self, header_name, default=None):
+ return self.headers.get(
+ header_name,
+ self.unredirected_hdrs.get(header_name, default))
+
+ def header_items(self):
+ hdrs = self.unredirected_hdrs.copy()
+ hdrs.update(self.headers)
+ return hdrs.items()
+
+class OpenerDirector:
+ def __init__(self):
+ client_version = "Python-urllib/%s" % __version__
+ self.addheaders = [('User-agent', client_version)]
+ # manage the individual handlers
+ self.handlers = []
+ self.handle_open = {}
+ self.handle_error = {}
+ self.process_response = {}
+ self.process_request = {}
+
+ def add_handler(self, handler):
+ if not hasattr(handler, "add_parent"):
+ raise TypeError("expected BaseHandler instance, got %r" %
+ type(handler))
+
+ added = False
+ for meth in dir(handler):
+ if meth in ["redirect_request", "do_open", "proxy_open"]:
+ # oops, coincidental match
+ continue
+
+ i = meth.find("_")
+ protocol = meth[:i]
+ condition = meth[i+1:]
+
+ if condition.startswith("error"):
+ j = condition.find("_") + i + 1
+ kind = meth[j+1:]
+ try:
+ kind = int(kind)
+ except ValueError:
+ pass
+ lookup = self.handle_error.get(protocol, {})
+ self.handle_error[protocol] = lookup
+ elif condition == "open":
+ kind = protocol
+ lookup = self.handle_open
+ elif condition == "response":
+ kind = protocol
+ lookup = self.process_response
+ elif condition == "request":
+ kind = protocol
+ lookup = self.process_request
+ else:
+ continue
+
+ handlers = lookup.setdefault(kind, [])
+ if handlers:
+ bisect.insort(handlers, handler)
+ else:
+ handlers.append(handler)
+ added = True
+
+ if added:
+ # the handlers must work in an specific order, the order
+ # is specified in a Handler attribute
+ bisect.insort(self.handlers, handler)
+ handler.add_parent(self)
+
+ def close(self):
+ # Only exists for backwards compatibility.
+ pass
+
+ def _call_chain(self, chain, kind, meth_name, *args):
+ # Handlers raise an exception if no one else should try to handle
+ # the request, or return None if they can't but another handler
+ # could. Otherwise, they return the response.
+ handlers = chain.get(kind, ())
+ for handler in handlers:
+ func = getattr(handler, meth_name)
+
+ result = func(*args)
+ if result is not None:
+ return result
+
+ def open(self, fullurl, data=None, timeout=None):
+ # accept a URL or a Request object
+ if isinstance(fullurl, basestring):
+ req = Request(fullurl, data)
+ else:
+ req = fullurl
+ if data is not None:
+ req.add_data(data)
+
+ req.timeout = timeout
+ protocol = req.get_type()
+
+ # pre-process request
+ meth_name = protocol+"_request"
+ for processor in self.process_request.get(protocol, []):
+ meth = getattr(processor, meth_name)
+ req = meth(req)
+
+ response = self._open(req, data)
+
+ # post-process response
+ meth_name = protocol+"_response"
+ for processor in self.process_response.get(protocol, []):
+ meth = getattr(processor, meth_name)
+ response = meth(req, response)
+
+ return response
+
+ def _open(self, req, data=None):
+ result = self._call_chain(self.handle_open, 'default',
+ 'default_open', req)
+ if result:
+ return result
+
+ protocol = req.get_type()
+ result = self._call_chain(self.handle_open, protocol, protocol +
+ '_open', req)
+ if result:
+ return result
+
+ return self._call_chain(self.handle_open, 'unknown',
+ 'unknown_open', req)
+
+ def error(self, proto, *args):
+ if proto in ('http', 'https'):
+ # XXX http[s] protocols are special-cased
+ dict = self.handle_error['http'] # https is not different than http
+ proto = args[2] # YUCK!
+ meth_name = 'http_error_%s' % proto
+ http_err = 1
+ orig_args = args
+ else:
+ dict = self.handle_error
+ meth_name = proto + '_error'
+ http_err = 0
+ args = (dict, proto, meth_name) + args
+ result = self._call_chain(*args)
+ if result:
+ return result
+
+ if http_err:
+ args = (dict, 'default', 'http_error_default') + orig_args
+ return self._call_chain(*args)
+
+# XXX probably also want an abstract factory that knows when it makes
+# sense to skip a superclass in favor of a subclass and when it might
+# make sense to include both
+
+def build_opener(*handlers):
+ """Create an opener object from a list of handlers.
+
+ The opener will use several default handlers, including support
+ for HTTP and FTP.
+
+ If any of the handlers passed as arguments are subclasses of the
+ default handlers, the default handlers will not be used.
+ """
+ import types
+ def isclass(obj):
+ return isinstance(obj, types.ClassType) or hasattr(obj, "__bases__")
+
+ opener = OpenerDirector()
+ default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,
+ HTTPDefaultErrorHandler, HTTPRedirectHandler,
+ FTPHandler, FileHandler, HTTPErrorProcessor]
+ if hasattr(httplib, 'HTTPS'):
+ default_classes.append(HTTPSHandler)
+ skip = []
+ for klass in default_classes:
+ for check in handlers:
+ if isclass(check):
+ if issubclass(check, klass):
+ skip.append(klass)
+ elif isinstance(check, klass):
+ skip.append(klass)
+ for klass in skip:
+ default_classes.remove(klass)
+
+ for klass in default_classes:
+ opener.add_handler(klass())
+
+ for h in handlers:
+ if isclass(h):
+ h = h()
+ opener.add_handler(h)
+ return opener
+
+class BaseHandler:
+ handler_order = 500
+
+ def add_parent(self, parent):
+ self.parent = parent
+
+ def close(self):
+ # Only exists for backwards compatibility
+ pass
+
+ def __lt__(self, other):
+ if not hasattr(other, "handler_order"):
+ # Try to preserve the old behavior of having custom classes
+ # inserted after default ones (works only for custom user
+ # classes which are not aware of handler_order).
+ return True
+ return self.handler_order < other.handler_order
+
+
+class HTTPErrorProcessor(BaseHandler):
+ """Process HTTP error responses."""
+ handler_order = 1000 # after all other processing
+
+ def http_response(self, request, response):
+ code, msg, hdrs = response.code, response.msg, response.info()
+
+ # According to RFC 2616, "2xx" code indicates that the client's
+ # request was successfully received, understood, and accepted.
+ if not (200 <= code < 300):
+ response = self.parent.error(
+ 'http', request, response, code, msg, hdrs)
+
+ return response
+
+ https_response = http_response
+
+class HTTPDefaultErrorHandler(BaseHandler):
+ def http_error_default(self, req, fp, code, msg, hdrs):
+ raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
+
+class HTTPRedirectHandler(BaseHandler):
+ # maximum number of redirections to any single URL
+ # this is needed because of the state that cookies introduce
+ max_repeats = 4
+ # maximum total number of redirections (regardless of URL) before
+ # assuming we're in a loop
+ max_redirections = 10
+
+ def __init__(self):
+ self.cache = {}
+
+ def redirect_request(self, req, fp, code, msg, headers, newurl):
+ """Return a Request or None in response to a redirect.
+
+ This is called by the http_error_30x methods when a
+ redirection response is received. If a redirection should
+ take place, return a new Request to allow http_error_30x to
+ perform the redirect. Otherwise, raise HTTPError if no-one
+ else should try to handle this url. Return None if you can't
+ but another Handler might.
+ """
+ m = req.get_method()
+ if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
+ or code in (301, 302, 303) and m == "POST"):
+ # Strictly (according to RFC 2616), 301 or 302 in response
+ # to a POST MUST NOT cause a redirection without confirmation
+ # from the user (of urllib2, in this case). In practice,
+ # essentially all clients do redirect in this case, so we
+ # do the same.
+ # be conciliant with URIs containing a space
+ newurl = newurl.replace(' ', '%20')
+ return Request(newurl,
+ headers=req.headers,
+ origin_req_host=req.get_origin_req_host(),
+ unverifiable=True)
+ else:
+ raise HTTPError(req.get_full_url(), code, msg, headers, fp)
+
+ # Implementation note: To avoid the server sending us into an
+ # infinite loop, the request object needs to track what URLs we
+ # have already seen. Do this by adding a handler-specific
+ # attribute to the Request object.
+
+ def http_error_301(self, req, fp, code, msg, headers):
+ if req in self.cache:
+ if 'location' in headers:
+ newurl = headers.getheaders('location')[0]
+ elif 'uri' in headers:
+ newurl = headers.getheaders('uri')[0]
+ else:
+ return
+ if hasattr(req, 'redirect_dict'):
+ visited = req.redirect_dict
+ if (visited.get(newurl, 0) > (self.max_repeats -1 ) or
+ len(visited) >= (self.max_redirections - 1)):
+ raise HTTPError(req.get_full_url(), code, self.inf_msg +
+ msg, headers, fp)
+ else:
+ visited = req.redirect_dict = {}
+ visited[newurl] = visited.get(newurl, 0) + 1
+ return self.cache[req]
+ self.cache[req] = self.http_error_302(req, fp, code, msg, headers)
+ return self.cache[req]
+
+ def http_error_302(self, req, fp, code, msg, headers):
+ # Some servers (incorrectly) return multiple Location headers
+ # (so probably same goes for URI). Use first header.
+ if 'location' in headers:
+ newurl = headers.getheaders('location')[0]
+ elif 'uri' in headers:
+ newurl = headers.getheaders('uri')[0]
+ else:
+ return
+ newurl = urlparse.urljoin(req.get_full_url(), newurl)
+
+ # XXX Probably want to forget about the state of the current
+ # request, although that might interact poorly with other
+ # handlers that also use handler-specific request attributes
+ new = self.redirect_request(req, fp, code, msg, headers, newurl)
+ if new is None:
+ return
+
+ # loop detection
+ # .redirect_dict has a key url if url was previously visited.
+ if hasattr(req, 'redirect_dict'):
+ visited = new.redirect_dict = req.redirect_dict
+ if (visited.get(newurl, 0) >= self.max_repeats or
+ len(visited) >= self.max_redirections):
+ raise HTTPError(req.get_full_url(), code,
+ self.inf_msg + msg, headers, fp)
+ else:
+ visited = new.redirect_dict = req.redirect_dict = {}
+ visited[newurl] = visited.get(newurl, 0) + 1
+
+ # Don't close the fp until we are sure that we won't use it
+ # with HTTPError.
+ fp.read()
+ fp.close()
+
+ return self.parent.open(new)
+
+ http_error_303 = http_error_307 = http_error_302
+
+ inf_msg = "The HTTP server returned a redirect error that would " \
+ "lead to an infinite loop.\n" \
+ "The last 30x error message was:\n"
+
+
+def _parse_proxy(proxy):
+ """Return (scheme, user, password, host/port) given a URL or an authority.
+
+ If a URL is supplied, it must have an authority (host:port) component.
+ According to RFC 3986, having an authority component means the URL must
+ have two slashes after the scheme:
+
+ >>> _parse_proxy('file:/ftp.example.com/')
+ Traceback (most recent call last):
+ ValueError: proxy URL with no authority: 'file:/ftp.example.com/'
+
+ The first three items of the returned tuple may be None.
+
+ Examples of authority parsing:
+
+ >>> _parse_proxy('proxy.example.com')
+ (None, None, None, 'proxy.example.com')
+ >>> _parse_proxy('proxy.example.com:3128')
+ (None, None, None, 'proxy.example.com:3128')
+
+ The authority component may optionally include userinfo (assumed to be
+ username:password):
+
+ >>> _parse_proxy('joe:password at proxy.example.com')
+ (None, 'joe', 'password', 'proxy.example.com')
+ >>> _parse_proxy('joe:password at proxy.example.com:3128')
+ (None, 'joe', 'password', 'proxy.example.com:3128')
+
+ Same examples, but with URLs instead:
+
+ >>> _parse_proxy('http://proxy.example.com/')
+ ('http', None, None, 'proxy.example.com')
+ >>> _parse_proxy('http://proxy.example.com:3128/')
+ ('http', None, None, 'proxy.example.com:3128')
+ >>> _parse_proxy('http://joe:password@proxy.example.com/')
+ ('http', 'joe', 'password', 'proxy.example.com')
+ >>> _parse_proxy('http://joe:password@proxy.example.com:3128')
+ ('http', 'joe', 'password', 'proxy.example.com:3128')
+
+ Everything after the authority is ignored:
+
+ >>> _parse_proxy('ftp://joe:password@proxy.example.com/rubbish:3128')
+ ('ftp', 'joe', 'password', 'proxy.example.com')
+
+ Test for no trailing '/' case:
+
+ >>> _parse_proxy('http://joe:password@proxy.example.com')
+ ('http', 'joe', 'password', 'proxy.example.com')
+
+ """
+ scheme, r_scheme = splittype(proxy)
+ if not r_scheme.startswith("/"):
+ # authority
+ scheme = None
+ authority = proxy
+ else:
+ # URL
+ if not r_scheme.startswith("//"):
+ raise ValueError("proxy URL with no authority: %r" % proxy)
+ # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
+ # and 3.3.), path is empty or starts with '/'
+ end = r_scheme.find("/", 2)
+ if end == -1:
+ end = None
+ authority = r_scheme[2:end]
+ userinfo, hostport = splituser(authority)
+ if userinfo is not None:
+ user, password = splitpasswd(userinfo)
+ else:
+ user = password = None
+ return scheme, user, password, hostport
+
+class ProxyHandler(BaseHandler):
+ # Proxies must be in front
+ handler_order = 100
+
+ def __init__(self, proxies=None):
+ if proxies is None:
+ proxies = getproxies()
+ assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
+ self.proxies = proxies
+ for type, url in proxies.items():
+ setattr(self, '%s_open' % type,
+ lambda r, proxy=url, type=type, meth=self.proxy_open: \
+ meth(r, proxy, type))
+
+ def proxy_open(self, req, proxy, type):
+ orig_type = req.get_type()
+ proxy_type, user, password, hostport = _parse_proxy(proxy)
+ if proxy_type is None:
+ proxy_type = orig_type
+ if user and password:
+ user_pass = '%s:%s' % (unquote(user), unquote(password))
+ creds = base64.b64encode(user_pass).strip()
+ req.add_header('Proxy-authorization', 'Basic ' + creds)
+ hostport = unquote(hostport)
+ req.set_proxy(hostport, proxy_type)
+ if orig_type == proxy_type:
+ # let other handlers take care of it
+ return None
+ else:
+ # need to start over, because the other handlers don't
+ # grok the proxy's URL type
+ # e.g. if we have a constructor arg proxies like so:
+ # {'http': 'ftp://proxy.example.com'}, we may end up turning
+ # a request for http://acme.example.com/a into one for
+ # ftp://proxy.example.com/a
+ return self.parent.open(req)
+
+class HTTPPasswordMgr:
+
+ def __init__(self):
+ self.passwd = {}
+
+ def add_password(self, realm, uri, user, passwd):
+ # uri could be a single URI or a sequence
+ if isinstance(uri, basestring):
+ uri = [uri]
+ if not realm in self.passwd:
+ self.passwd[realm] = {}
+ for default_port in True, False:
+ reduced_uri = tuple(
+ [self.reduce_uri(u, default_port) for u in uri])
+ self.passwd[realm][reduced_uri] = (user, passwd)
+
+ def find_user_password(self, realm, authuri):
+ domains = self.passwd.get(realm, {})
+ for default_port in True, False:
+ reduced_authuri = self.reduce_uri(authuri, default_port)
+ for uris, authinfo in domains.iteritems():
+ for uri in uris:
+ if self.is_suburi(uri, reduced_authuri):
+ return authinfo
+ return None, None
+
+ def reduce_uri(self, uri, default_port=True):
+ """Accept authority or URI and extract only the authority and path."""
+ # note HTTP URLs do not have a userinfo component
+ parts = urlparse.urlsplit(uri)
+ if parts[1]:
+ # URI
+ scheme = parts[0]
+ authority = parts[1]
+ path = parts[2] or '/'
+ else:
+ # host or host:port
+ scheme = None
+ authority = uri
+ path = '/'
+ host, port = splitport(authority)
+ if default_port and port is None and scheme is not None:
+ dport = {"http": 80,
+ "https": 443,
+ }.get(scheme)
+ if dport is not None:
+ authority = "%s:%d" % (host, dport)
+ return authority, path
+
+ def is_suburi(self, base, test):
+ """Check if test is below base in a URI tree
+
+ Both args must be URIs in reduced form.
+ """
+ if base == test:
+ return True
+ if base[0] != test[0]:
+ return False
+ common = posixpath.commonprefix((base[1], test[1]))
+ if len(common) == len(base[1]):
+ return True
+ return False
+
+
+class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):
+
+ def find_user_password(self, realm, authuri):
+ user, password = HTTPPasswordMgr.find_user_password(self, realm,
+ authuri)
+ if user is not None:
+ return user, password
+ return HTTPPasswordMgr.find_user_password(self, None, authuri)
+
+
+class AbstractBasicAuthHandler:
+
+ # XXX this allows for multiple auth-schemes, but will stupidly pick
+ # the last one with a realm specified.
+
+ rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', re.I)
+
+ # XXX could pre-emptively send auth info already accepted (RFC 2617,
+ # end of section 2, and section 1.2 immediately after "credentials"
+ # production).
+
+ def __init__(self, password_mgr=None):
+ if password_mgr is None:
+ password_mgr = HTTPPasswordMgr()
+ self.passwd = password_mgr
+ self.add_password = self.passwd.add_password
+
+ def http_error_auth_reqed(self, authreq, host, req, headers):
+ # host may be an authority (without userinfo) or a URL with an
+ # authority
+ # XXX could be multiple headers
+ authreq = headers.get(authreq, None)
+ if authreq:
+ mo = AbstractBasicAuthHandler.rx.search(authreq)
+ if mo:
+ scheme, realm = mo.groups()
+ if scheme.lower() == 'basic':
+ return self.retry_http_basic_auth(host, req, realm)
+
+ def retry_http_basic_auth(self, host, req, realm):
+ user, pw = self.passwd.find_user_password(realm, host)
+ if pw is not None:
+ raw = "%s:%s" % (user, pw)
+ auth = 'Basic %s' % base64.b64encode(raw).strip()
+ if req.headers.get(self.auth_header, None) == auth:
+ return None
+ req.add_header(self.auth_header, auth)
+ return self.parent.open(req)
+ else:
+ return None
+
+
+class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
+
+ auth_header = 'Authorization'
+
+ def http_error_401(self, req, fp, code, msg, headers):
+ url = req.get_full_url()
+ return self.http_error_auth_reqed('www-authenticate',
+ url, req, headers)
+
+
+class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
+
+ auth_header = 'Proxy-authorization'
+
+ def http_error_407(self, req, fp, code, msg, headers):
+ # http_error_auth_reqed requires that there is no userinfo component in
+ # authority. Assume there isn't one, since urllib2 does not (and
+ # should not, RFC 3986 s. 3.2.1) support requests for URLs containing
+ # userinfo.
+ authority = req.get_host()
+ return self.http_error_auth_reqed('proxy-authenticate',
+ authority, req, headers)
+
+
+def randombytes(n):
+ """Return n random bytes."""
+ # Use /dev/urandom if it is available. Fall back to random module
+ # if not. It might be worthwhile to extend this function to use
+ # other platform-specific mechanisms for getting random bytes.
+ if os.path.exists("/dev/urandom"):
+ f = open("/dev/urandom")
+ s = f.read(n)
+ f.close()
+ return s
+ else:
+ L = [chr(random.randrange(0, 256)) for i in range(n)]
+ return "".join(L)
+
+class AbstractDigestAuthHandler:
+ # Digest authentication is specified in RFC 2617.
+
+ # XXX The client does not inspect the Authentication-Info header
+ # in a successful response.
+
+ # XXX It should be possible to test this implementation against
+ # a mock server that just generates a static set of challenges.
+
+ # XXX qop="auth-int" supports is shaky
+
+ def __init__(self, passwd=None):
+ if passwd is None:
+ passwd = HTTPPasswordMgr()
+ self.passwd = passwd
+ self.add_password = self.passwd.add_password
+ self.retried = 0
+ self.nonce_count = 0
+
+ def reset_retry_count(self):
+ self.retried = 0
+
+ def http_error_auth_reqed(self, auth_header, host, req, headers):
+ authreq = headers.get(auth_header, None)
+ if self.retried > 5:
+ # Don't fail endlessly - if we failed once, we'll probably
+ # fail a second time. Hm. Unless the Password Manager is
+ # prompting for the information. Crap. This isn't great
+ # but it's better than the current 'repeat until recursion
+ # depth exceeded' approach <wink>
+ raise HTTPError(req.get_full_url(), 401, "digest auth failed",
+ headers, None)
+ else:
+ self.retried += 1
+ if authreq:
+ scheme = authreq.split()[0]
+ if scheme.lower() == 'digest':
+ return self.retry_http_digest_auth(req, authreq)
+
+ def retry_http_digest_auth(self, req, auth):
+ token, challenge = auth.split(' ', 1)
+ chal = parse_keqv_list(parse_http_list(challenge))
+ auth = self.get_authorization(req, chal)
+ if auth:
+ auth_val = 'Digest %s' % auth
+ if req.headers.get(self.auth_header, None) == auth_val:
+ return None
+ req.add_unredirected_header(self.auth_header, auth_val)
+ resp = self.parent.open(req)
+ return resp
+
+ def get_cnonce(self, nonce):
+ # The cnonce-value is an opaque
+ # quoted string value provided by the client and used by both client
+ # and server to avoid chosen plaintext attacks, to provide mutual
+ # authentication, and to provide some message integrity protection.
+ # This isn't a fabulous effort, but it's probably Good Enough.
+ dig = hashlib.sha1("%s:%s:%s:%s" % (self.nonce_count, nonce, time.ctime(),
+ randombytes(8))).hexdigest()
+ return dig[:16]
+
+ def get_authorization(self, req, chal):
+ try:
+ realm = chal['realm']
+ nonce = chal['nonce']
+ qop = chal.get('qop')
+ algorithm = chal.get('algorithm', 'MD5')
+ # mod_digest doesn't send an opaque, even though it isn't
+ # supposed to be optional
+ opaque = chal.get('opaque', None)
+ except KeyError:
+ return None
+
+ H, KD = self.get_algorithm_impls(algorithm)
+ if H is None:
+ return None
+
+ user, pw = self.passwd.find_user_password(realm, req.get_full_url())
+ if user is None:
+ return None
+
+ # XXX not implemented yet
+ if req.has_data():
+ entdig = self.get_entity_digest(req.get_data(), chal)
+ else:
+ entdig = None
+
+ A1 = "%s:%s:%s" % (user, realm, pw)
+ A2 = "%s:%s" % (req.get_method(),
+ # XXX selector: what about proxies and full urls
+ req.get_selector())
+ if qop == 'auth':
+ self.nonce_count += 1
+ ncvalue = '%08x' % self.nonce_count
+ cnonce = self.get_cnonce(nonce)
+ noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2))
+ respdig = KD(H(A1), noncebit)
+ elif qop is None:
+ respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))
+ else:
+ # XXX handle auth-int.
+ raise URLError("qop '%s' is not supported." % qop)
+
+ # XXX should the partial digests be encoded too?
+
+ base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \
+ 'response="%s"' % (user, realm, nonce, req.get_selector(),
+ respdig)
+ if opaque:
+ base += ', opaque="%s"' % opaque
+ if entdig:
+ base += ', digest="%s"' % entdig
+ base += ', algorithm="%s"' % algorithm
+ if qop:
+ base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)
+ return base
+
+ def get_algorithm_impls(self, algorithm):
+ # lambdas assume digest modules are imported at the top level
+ if algorithm == 'MD5':
+ H = lambda x: hashlib.md5(x).hexdigest()
+ elif algorithm == 'SHA':
+ H = lambda x: hashlib.sha1(x).hexdigest()
+ # XXX MD5-sess
+ KD = lambda s, d: H("%s:%s" % (s, d))
+ return H, KD
+
+ def get_entity_digest(self, data, chal):
+ # XXX not implemented yet
+ return None
+
+
+class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
+ """An authentication protocol defined by RFC 2069
+
+ Digest authentication improves on basic authentication because it
+ does not transmit passwords in the clear.
+ """
+
+ auth_header = 'Authorization'
+ handler_order = 490 # before Basic auth
+
+ def http_error_401(self, req, fp, code, msg, headers):
+ host = urlparse.urlparse(req.get_full_url())[1]
+ retry = self.http_error_auth_reqed('www-authenticate',
+ host, req, headers)
+ self.reset_retry_count()
+ return retry
+
+
+class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
+
+ auth_header = 'Proxy-Authorization'
+ handler_order = 490 # before Basic auth
+
+ def http_error_407(self, req, fp, code, msg, headers):
+ host = req.get_host()
+ retry = self.http_error_auth_reqed('proxy-authenticate',
+ host, req, headers)
+ self.reset_retry_count()
+ return retry
+
+class AbstractHTTPHandler(BaseHandler):
+
+ def __init__(self, debuglevel=0):
+ self._debuglevel = debuglevel
+
+ def set_http_debuglevel(self, level):
+ self._debuglevel = level
+
+ def do_request_(self, request):
+ host = request.get_host()
+ if not host:
+ raise URLError('no host given')
+
+ if request.has_data(): # POST
+ data = request.get_data()
+ if not request.has_header('Content-type'):
+ request.add_unredirected_header(
+ 'Content-type',
+ 'application/x-www-form-urlencoded')
+ if not request.has_header('Content-length'):
+ request.add_unredirected_header(
+ 'Content-length', '%d' % len(data))
+
+ scheme, sel = splittype(request.get_selector())
+ sel_host, sel_path = splithost(sel)
+ if not request.has_header('Host'):
+ request.add_unredirected_header('Host', sel_host or host)
+ for name, value in self.parent.addheaders:
+ name = name.capitalize()
+ if not request.has_header(name):
+ request.add_unredirected_header(name, value)
+
+ return request
+
+ def do_open(self, http_class, req):
+ """Return an addinfourl object for the request, using http_class.
+
+ http_class must implement the HTTPConnection API from httplib.
+ The addinfourl return value is a file-like object. It also
+ has methods and attributes including:
+ - info(): return a mimetools.Message object for the headers
+ - geturl(): return the original request URL
+ - code: HTTP status code
+ """
+ host = req.get_host()
+ if not host:
+ raise URLError('no host given')
+
+ h = http_class(host, timeout=req.timeout) # will parse host:port
+ h.set_debuglevel(self._debuglevel)
+
+ headers = dict(req.headers)
+ headers.update(req.unredirected_hdrs)
+ # We want to make an HTTP/1.1 request, but the addinfourl
+ # class isn't prepared to deal with a persistent connection.
+ # It will try to read all remaining data from the socket,
+ # which will block while the server waits for the next request.
+ # So make sure the connection gets closed after the (only)
+ # request.
+ headers["Connection"] = "close"
+ headers = dict(
+ (name.title(), val) for name, val in headers.items())
+ try:
+ h.request(req.get_method(), req.get_selector(), req.data, headers)
+ r = h.getresponse()
+ except socket.error, err: # XXX what error?
+ raise URLError(err)
+
+ # Pick apart the HTTPResponse object to get the addinfourl
+ # object initialized properly.
+
+ # Wrap the HTTPResponse object in socket's file object adapter
+ # for Windows. That adapter calls recv(), so delegate recv()
+ # to read(). This weird wrapping allows the returned object to
+ # have readline() and readlines() methods.
+
+ # XXX It might be better to extract the read buffering code
+ # out of socket._fileobject() and into a base class.
+
+ r.recv = r.read
+ fp = socket._fileobject(r, close=True)
+
+ resp = addinfourl(fp, r.msg, req.get_full_url())
+ resp.code = r.status
+ resp.msg = r.reason
+ return resp
+
+
+class HTTPHandler(AbstractHTTPHandler):
+
+ def http_open(self, req):
+ return self.do_open(httplib.HTTPConnection, req)
+
+ http_request = AbstractHTTPHandler.do_request_
+
+if hasattr(httplib, 'HTTPS'):
+ class HTTPSHandler(AbstractHTTPHandler):
+
+ def https_open(self, req):
+ return self.do_open(httplib.HTTPSConnection, req)
+
+ https_request = AbstractHTTPHandler.do_request_
+
+class HTTPCookieProcessor(BaseHandler):
+ def __init__(self, cookiejar=None):
+ import cookielib
+ if cookiejar is None:
+ cookiejar = cookielib.CookieJar()
+ self.cookiejar = cookiejar
+
+ def http_request(self, request):
+ self.cookiejar.add_cookie_header(request)
+ return request
+
+ def http_response(self, request, response):
+ self.cookiejar.extract_cookies(response, request)
+ return response
+
+ https_request = http_request
+ https_response = http_response
+
+class UnknownHandler(BaseHandler):
+ def unknown_open(self, req):
+ type = req.get_type()
+ raise URLError('unknown url type: %s' % type)
+
+def parse_keqv_list(l):
+ """Parse list of key=value strings where keys are not duplicated."""
+ parsed = {}
+ for elt in l:
+ k, v = elt.split('=', 1)
+ if v[0] == '"' and v[-1] == '"':
+ v = v[1:-1]
+ parsed[k] = v
+ return parsed
+
+def parse_http_list(s):
+ """Parse lists as described by RFC 2068 Section 2.
+
+ In particular, parse comma-separated lists where the elements of
+ the list may include quoted-strings. A quoted-string could
+ contain a comma. A non-quoted string could have quotes in the
+ middle. Neither commas nor quotes count if they are escaped.
+ Only double-quotes count, not single-quotes.
+ """
+ res = []
+ part = ''
+
+ escape = quote = False
+ for cur in s:
+ if escape:
+ part += cur
+ escape = False
+ continue
+ if quote:
+ if cur == '\\':
+ escape = True
+ continue
+ elif cur == '"':
+ quote = False
+ part += cur
+ continue
+
+ if cur == ',':
+ res.append(part)
+ part = ''
+ continue
+
+ if cur == '"':
+ quote = True
+
+ part += cur
+
+ # append last part
+ if part:
+ res.append(part)
+
+ return [part.strip() for part in res]
+
+class FileHandler(BaseHandler):
+ # Use local file or FTP depending on form of URL
+ def file_open(self, req):
+ url = req.get_selector()
+ if url[:2] == '//' and url[2:3] != '/':
+ req.type = 'ftp'
+ return self.parent.open(req)
+ else:
+ return self.open_local_file(req)
+
+ # names for the localhost
+ names = None
+ def get_names(self):
+ if FileHandler.names is None:
+ try:
+ FileHandler.names = (socket.gethostbyname('localhost'),
+ socket.gethostbyname(socket.gethostname()))
+ except socket.gaierror:
+ FileHandler.names = (socket.gethostbyname('localhost'),)
+ return FileHandler.names
+
+ # not entirely sure what the rules are here
+ def open_local_file(self, req):
+ import email.utils
+ import mimetypes
+ host = req.get_host()
+ file = req.get_selector()
+ localfile = url2pathname(file)
+ try:
+ stats = os.stat(localfile)
+ size = stats.st_size
+ modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
+ mtype = mimetypes.guess_type(file)[0]
+ headers = mimetools.Message(StringIO(
+ 'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
+ (mtype or 'text/plain', size, modified)))
+ if host:
+ host, port = splitport(host)
+ if not host or \
+ (not port and socket.gethostbyname(host) in self.get_names()):
+ return addinfourl(open(localfile, 'rb'),
+ headers, 'file:'+file)
+ except OSError, msg:
+ # urllib2 users shouldn't expect OSErrors coming from urlopen()
+ raise URLError(msg)
+ raise URLError('file not on local host')
+
+class FTPHandler(BaseHandler):
+ def ftp_open(self, req):
+ import ftplib
+ import mimetypes
+ host = req.get_host()
+ if not host:
+ raise IOError, ('ftp error', 'no host given')
+ host, port = splitport(host)
+ if port is None:
+ port = ftplib.FTP_PORT
+ else:
+ port = int(port)
+
+ # username/password handling
+ user, host = splituser(host)
+ if user:
+ user, passwd = splitpasswd(user)
+ else:
+ passwd = None
+ host = unquote(host)
+ user = unquote(user or '')
+ passwd = unquote(passwd or '')
+
+ try:
+ host = socket.gethostbyname(host)
+ except socket.error, msg:
+ raise URLError(msg)
+ path, attrs = splitattr(req.get_selector())
+ dirs = path.split('/')
+ dirs = map(unquote, dirs)
+ dirs, file = dirs[:-1], dirs[-1]
+ if dirs and not dirs[0]:
+ dirs = dirs[1:]
+ try:
+ fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
+ type = file and 'I' or 'D'
+ for attr in attrs:
+ attr, value = splitvalue(attr)
+ if attr.lower() == 'type' and \
+ value in ('a', 'A', 'i', 'I', 'd', 'D'):
+ type = value.upper()
+ fp, retrlen = fw.retrfile(file, type)
+ headers = ""
+ mtype = mimetypes.guess_type(req.get_full_url())[0]
+ if mtype:
+ headers += "Content-type: %s\n" % mtype
+ if retrlen is not None and retrlen >= 0:
+ headers += "Content-length: %d\n" % retrlen
+ sf = StringIO(headers)
+ headers = mimetools.Message(sf)
+ return addinfourl(fp, headers, req.get_full_url())
+ except ftplib.all_errors, msg:
+ raise IOError, ('ftp error', msg), sys.exc_info()[2]
+
+ def connect_ftp(self, user, passwd, host, port, dirs, timeout):
+ fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
+## fw.ftp.set_debuglevel(1)
+ return fw
+
+class CacheFTPHandler(FTPHandler):
+ # XXX would be nice to have pluggable cache strategies
+ # XXX this stuff is definitely not thread safe
+ def __init__(self):
+ self.cache = {}
+ self.timeout = {}
+ self.soonest = 0
+ self.delay = 60
+ self.max_conns = 16
+
+ def setTimeout(self, t):
+ self.delay = t
+
+ def setMaxConns(self, m):
+ self.max_conns = m
+
+ def connect_ftp(self, user, passwd, host, port, dirs, timeout):
+ key = user, host, port, '/'.join(dirs), timeout
+ if key in self.cache:
+ self.timeout[key] = time.time() + self.delay
+ else:
+ self.cache[key] = ftpwrapper(user, passwd, host, port, dirs, timeout)
+ self.timeout[key] = time.time() + self.delay
+ self.check_cache()
+ return self.cache[key]
+
+ def check_cache(self):
+ # first check for old ones
+ t = time.time()
+ if self.soonest <= t:
+ for k, v in self.timeout.items():
+ if v < t:
+ self.cache[k].close()
+ del self.cache[k]
+ del self.timeout[k]
+ self.soonest = min(self.timeout.values())
+
+ # then check the size
+ if len(self.cache) == self.max_conns:
+ for k, v in self.timeout.items():
+ if v == self.soonest:
+ del self.cache[k]
+ del self.timeout[k]
+ break
+ self.soonest = min(self.timeout.values())
More information about the Python-checkins
mailing list