Thu Aug 2 23:31:41 CEST 2007

Author: georg.brandl
Date: Thu Aug  2 23:31:34 2007
New Revision: 56680

Added:
   doctools/trunk/Doc-26/howto/
   doctools/trunk/Doc-26/howto/advocacy.rst
   doctools/trunk/Doc-26/howto/curses.rst
   doctools/trunk/Doc-26/howto/doanddont.rst
   doctools/trunk/Doc-26/howto/functional.rst
   doctools/trunk/Doc-26/howto/index.rst
   doctools/trunk/Doc-26/howto/regex.rst
   doctools/trunk/Doc-26/howto/sockets.rst
   doctools/trunk/Doc-26/howto/unicode.rst
   doctools/trunk/Doc-26/howto/urllib2.rst
   doctools/trunk/Doc-3k/howto/
   doctools/trunk/Doc-3k/howto/advocacy.rst
   doctools/trunk/Doc-3k/howto/curses.rst
   doctools/trunk/Doc-3k/howto/doanddont.rst
   doctools/trunk/Doc-3k/howto/functional.rst
   doctools/trunk/Doc-3k/howto/index.rst
   doctools/trunk/Doc-3k/howto/regex.rst
   doctools/trunk/Doc-3k/howto/sockets.rst
   doctools/trunk/Doc-3k/howto/unicode.rst
   doctools/trunk/Doc-3k/howto/urllib2.rst
Modified:
   doctools/trunk/Doc-26/ACKS
   doctools/trunk/Doc-26/TODO
   doctools/trunk/Doc-26/contents.rst
   doctools/trunk/Doc-3k/ACKS
   doctools/trunk/Doc-3k/TODO
   doctools/trunk/Doc-3k/contents.rst
   doctools/trunk/sphinx/templates/index.html
Log:
Integrate the HOWTOs into the tree.


Modified: doctools/trunk/Doc-26/ACKS
==============================================================================

--- doctools/trunk/Doc-26/ACKS	(original)
+++ doctools/trunk/Doc-26/ACKS	Thu Aug  2 23:31:34 2007
@@ -15,7 +15,7 @@
 * Steve Alexander
 * Jim Ahlstrom
 * Fred Allen
-* A. Amoroso
+* \A. Amoroso
 * Pehr Anderson
 * Oliver Andrich
 * Jesús Cea Avión
@@ -40,7 +40,7 @@
 * Jeremy Craven
 * Andrew Dalke
 * Ben Darnell
-* L. Peter Deutsch
+* \L. Peter Deutsch
 * Robert Donohue
 * Fred L. Drake, Jr.
 * Jeff Epler

Modified: doctools/trunk/Doc-26/TODO
==============================================================================
--- doctools/trunk/Doc-26/TODO	(original)
+++ doctools/trunk/Doc-26/TODO	Thu Aug  2 23:31:34 2007
@@ -2,7 +2,6 @@
 ======================
 
 * split very large files and add toctrees
-* integrate standalone HOWTOs
 * find out which files get "comments disabled" metadata
 * write "About these documents"
 * finish "Documenting Python"

Modified: doctools/trunk/Doc-26/contents.rst
==============================================================================
--- doctools/trunk/Doc-26/contents.rst	(original)
+++ doctools/trunk/Doc-26/contents.rst	Thu Aug  2 23:31:34 2007
@@ -14,6 +14,7 @@
    distutils/index.rst
    install/index.rst
    documenting/index.rst
+   howto/index.rst
 
    bugs.rst
    about.rst

Added: doctools/trunk/Doc-26/howto/advocacy.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/advocacy.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,356 @@
+*************************
+  Python Advocacy HOWTO
+*************************
+
+:Author: A.M. Kuchling
+:Release: 0.03
+
+
+.. topic:: Abstract
+
+   It's usually difficult to get your management to accept open source software,
+   and Python is no exception to this rule.  This document discusses reasons to use
+   Python, strategies for winning acceptance, facts and arguments you can use, and
+   cases where you *shouldn't* try to use Python.
+
+
+Reasons to Use Python
+=====================
+
+There are several reasons to incorporate a scripting language into your
+development process, and this section will discuss them, and why Python has some
+properties that make it a particularly good choice.
+
+
+Programmability
+---------------
+
+Programs are often organized in a modular fashion.  Lower-level operations are
+grouped together, and called by higher-level functions, which may in turn be
+used as basic operations by still further upper levels.
+
+For example, the lowest level might define a very low-level set of functions for
+accessing a hash table.  The next level might use hash tables to store the
+headers of a mail message, mapping a header name like ``Date`` to a value such
+as ``Tue, 13 May 1997 20:00:54 -0400``.  A yet higher level may operate on
+message objects, without knowing or caring that message headers are stored in a
+hash table, and so forth.
+
+Often, the lowest levels do very simple things; they implement a data structure
+such as a binary tree or hash table, or they perform some simple computation,
+such as converting a date string to a number.  The higher levels then contain
+logic connecting these primitive operations.  Using the approach, the primitives
+can be seen as basic building blocks which are then glued together to produce
+the complete product.
+
+Why is this design approach relevant to Python?  Because Python is well suited
+to functioning as such a glue language.  A common approach is to write a Python
+module that implements the lower level operations; for the sake of speed, the
+implementation might be in C, Java, or even Fortran.  Once the primitives are
+available to Python programs, the logic underlying higher level operations is
+written in the form of Python code.  The high-level logic is then more
+understandable, and easier to modify.
+
+John Ousterhout wrote a paper that explains this idea at greater length,
+entitled "Scripting: Higher Level Programming for the 21st Century".  I
+recommend that you read this paper; see the references for the URL.  Ousterhout
+is the inventor of the Tcl language, and therefore argues that Tcl should be
+used for this purpose; he only briefly refers to other languages such as Python,
+Perl, and Lisp/Scheme, but in reality, Ousterhout's argument applies to
+scripting languages in general, since you could equally write extensions for any
+of the languages mentioned above.
+
+
+Prototyping
+-----------
+
+In *The Mythical Man-Month*, Fredrick Brooks suggests the following rule when
+planning software projects: "Plan to throw one away; you will anyway."  Brooks
+is saying that the first attempt at a software design often turns out to be
+wrong; unless the problem is very simple or you're an extremely good designer,
+you'll find that new requirements and features become apparent once development
+has actually started.  If these new requirements can't be cleanly incorporated
+into the program's structure, you're presented with two unpleasant choices:
+hammer the new features into the program somehow, or scrap everything and write
+a new version of the program, taking the new features into account from the
+beginning.
+
+Python provides you with a good environment for quickly developing an initial
+prototype.  That lets you get the overall program structure and logic right, and
+you can fine-tune small details in the fast development cycle that Python
+provides.  Once you're satisfied with the GUI interface or program output, you
+can translate the Python code into C++, Fortran, Java, or some other compiled
+language.
+
+Prototyping means you have to be careful not to use too many Python features
+that are hard to implement in your other language.  Using ``eval()``, or regular
+expressions, or the :mod:`pickle` module, means that you're going to need C or
+Java libraries for formula evaluation, regular expressions, and serialization,
+for example.  But it's not hard to avoid such tricky code, and in the end the
+translation usually isn't very difficult.  The resulting code can be rapidly
+debugged, because any serious logical errors will have been removed from the
+prototype, leaving only more minor slip-ups in the translation to track down.
+
+This strategy builds on the earlier discussion of programmability. Using Python
+as glue to connect lower-level components has obvious relevance for constructing
+prototype systems.  In this way Python can help you with development, even if
+end users never come in contact with Python code at all.  If the performance of
+the Python version is adequate and corporate politics allow it, you may not need
+to do a translation into C or Java, but it can still be faster to develop a
+prototype and then translate it, instead of attempting to produce the final
+version immediately.
+
+One example of this development strategy is Microsoft Merchant Server. Version
+1.0 was written in pure Python, by a company that subsequently was purchased by
+Microsoft.  Version 2.0 began to translate the code into C++, shipping with some
+C++code and some Python code.  Version 3.0 didn't contain any Python at all; all
+the code had been translated into C++.  Even though the product doesn't contain
+a Python interpreter, the Python language has still served a useful purpose by
+speeding up development.
+
+This is a very common use for Python.  Past conference papers have also
+described this approach for developing high-level numerical algorithms; see
+David M. Beazley and Peter S. Lomdahl's paper "Feeding a Large-scale Physics
+Application to Python" in the references for a good example.  If an algorithm's
+basic operations are things like "Take the inverse of this 4000x4000 matrix",
+and are implemented in some lower-level language, then Python has almost no
+additional performance cost; the extra time required for Python to evaluate an
+expression like ``m.invert()`` is dwarfed by the cost of the actual computation.
+It's particularly good for applications where seemingly endless tweaking is
+required to get things right. GUI interfaces and Web sites are prime examples.
+
+The Python code is also shorter and faster to write (once you're familiar with
+Python), so it's easier to throw it away if you decide your approach was wrong;
+if you'd spent two weeks working on it instead of just two hours, you might
+waste time trying to patch up what you've got out of a natural reluctance to
+admit that those two weeks were wasted.  Truthfully, those two weeks haven't
+been wasted, since you've learnt something about the problem and the technology
+you're using to solve it, but it's human nature to view this as a failure of
+some sort.
+
+
+Simplicity and Ease of Understanding
+------------------------------------
+
+Python is definitely *not* a toy language that's only usable for small tasks.
+The language features are general and powerful enough to enable it to be used
+for many different purposes.  It's useful at the small end, for 10- or 20-line
+scripts, but it also scales up to larger systems that contain thousands of lines
+of code.
+
+However, this expressiveness doesn't come at the cost of an obscure or tricky
+syntax.  While Python has some dark corners that can lead to obscure code, there
+are relatively few such corners, and proper design can isolate their use to only
+a few classes or modules.  It's certainly possible to write confusing code by
+using too many features with too little concern for clarity, but most Python
+code can look a lot like a slightly-formalized version of human-understandable
+pseudocode.
+
+In *The New Hacker's Dictionary*, Eric S. Raymond gives the following definition
+for "compact":
+
+.. epigraph::
+
+   Compact *adj.*  Of a design, describes the valuable property that it can all be
+   apprehended at once in one's head. This generally means the thing created from
+   the design can be used with greater facility and fewer errors than an equivalent
+   tool that is not compact. Compactness does not imply triviality or lack of
+   power; for example, C is compact and FORTRAN is not, but C is more powerful than
+   FORTRAN. Designs become non-compact through accreting features and cruft that
+   don't merge cleanly into the overall design scheme (thus, some fans of Classic C
+   maintain that ANSI C is no longer compact).
+
+   (From http://www.catb.org/ esr/jargon/html/C/compact.html)
+
+In this sense of the word, Python is quite compact, because the language has
+just a few ideas, which are used in lots of places.  Take namespaces, for
+example.  Import a module with ``import math``, and you create a new namespace
+called ``math``.  Classes are also namespaces that share many of the properties
+of modules, and have a few of their own; for example, you can create instances
+of a class. Instances?  They're yet another namespace.  Namespaces are currently
+implemented as Python dictionaries, so they have the same methods as the
+standard dictionary data type: .keys() returns all the keys, and so forth.
+
+This simplicity arises from Python's development history.  The language syntax
+derives from different sources; ABC, a relatively obscure teaching language, is
+one primary influence, and Modula-3 is another.  (For more information about ABC
+and Modula-3, consult their respective Web sites at http://www.cwi.nl/
+steven/abc/ and http://www.m3.org.)  Other features have come from C, Icon,
+Algol-68, and even Perl.  Python hasn't really innovated very much, but instead
+has tried to keep the language small and easy to learn, building on ideas that
+have been tried in other languages and found useful.
+
+Simplicity is a virtue that should not be underestimated.  It lets you learn the
+language more quickly, and then rapidly write code, code that often works the
+first time you run it.
+
+
+Java Integration
+----------------
+
+If you're working with Java, Jython (http://www.jython.org/) is definitely worth
+your attention.  Jython is a re-implementation of Python in Java that compiles
+Python code into Java bytecodes.  The resulting environment has very tight,
+almost seamless, integration with Java.  It's trivial to access Java classes
+from Python, and you can write Python classes that subclass Java classes.
+Jython can be used for prototyping Java applications in much the same way
+CPython is used, and it can also be used for test suites for Java code, or
+embedded in a Java application to add scripting capabilities.
+
+
+Arguments and Rebuttals
+=======================
+
+Let's say that you've decided upon Python as the best choice for your
+application.  How can you convince your management, or your fellow developers,
+to use Python?  This section lists some common arguments against using Python,
+and provides some possible rebuttals.
+
+**Python is freely available software that doesn't cost anything. How good can
+it be?**
+
+Very good, indeed.  These days Linux and Apache, two other pieces of open source
+software, are becoming more respected as alternatives to commercial software,
+but Python hasn't had all the publicity.
+
+Python has been around for several years, with many users and developers.
+Accordingly, the interpreter has been used by many people, and has gotten most
+of the bugs shaken out of it.  While bugs are still discovered at intervals,
+they're usually either quite obscure (they'd have to be, for no one to have run
+into them before) or they involve interfaces to external libraries.  The
+internals of the language itself are quite stable.
+
+Having the source code should be viewed as making the software available for
+peer review; people can examine the code, suggest (and implement) improvements,
+and track down bugs.  To find out more about the idea of open source code, along
+with arguments and case studies supporting it, go to http://www.opensource.org.
+
+**Who's going to support it?**
+
+Python has a sizable community of developers, and the number is still growing.
+The Internet community surrounding the language is an active one, and is worth
+being considered another one of Python's advantages. Most questions posted to
+the comp.lang.python newsgroup are quickly answered by someone.
+
+Should you need to dig into the source code, you'll find it's clear and well-
+organized, so it's not very difficult to write extensions and track down bugs
+yourself.  If you'd prefer to pay for support, there are companies and
+individuals who offer commercial support for Python.
+
+**Who uses Python for serious work?**
+
+Lots of people; one interesting thing about Python is the surprising diversity
+of applications that it's been used for.  People are using Python to:
+
+* Run Web sites
+
+* Write GUI interfaces
+
+* Control number-crunching code on supercomputers
+
+* Make a commercial application scriptable by embedding the Python interpreter
+  inside it
+
+* Process large XML data sets
+
+* Build test suites for C or Java code
+
+Whatever your application domain is, there's probably someone who's used Python
+for something similar.  Yet, despite being useable for such high-end
+applications, Python's still simple enough to use for little jobs.
+
+See http://wiki.python.org/moin/OrganizationsUsingPython for a list of some of
+the  organizations that use Python.
+
+**What are the restrictions on Python's use?**
+
+They're practically nonexistent.  Consult the :file:`Misc/COPYRIGHT` file in the
+source distribution, or http://www.python.org/doc/Copyright.html for the full
+language, but it boils down to three conditions.
+
+* You have to leave the copyright notice on the software; if you don't include
+  the source code in a product, you have to put the copyright notice in the
+  supporting documentation.
+
+* Don't claim that the institutions that have developed Python endorse your
+  product in any way.
+
+* If something goes wrong, you can't sue for damages.  Practically all software
+  licences contain this condition.
+
+Notice that you don't have to provide source code for anything that contains
+Python or is built with it.  Also, the Python interpreter and accompanying
+documentation can be modified and redistributed in any way you like, and you
+don't have to pay anyone any licensing fees at all.
+
+**Why should we use an obscure language like Python instead of well-known
+language X?**
+
+I hope this HOWTO, and the documents listed in the final section, will help
+convince you that Python isn't obscure, and has a healthily growing user base.
+One word of advice: always present Python's positive advantages, instead of
+concentrating on language X's failings.  People want to know why a solution is
+good, rather than why all the other solutions are bad.  So instead of attacking
+a competing solution on various grounds, simply show how Python's virtues can
+help.
+
+
+Useful Resources
+================
+
+http://www.pythonology.com/success
+   The Python Success Stories are a collection of stories from successful users of
+   Python, with the emphasis on business and corporate users.
+
+.. % \term{\url{http://www.fsbassociates.com/books/pythonchpt1.htm}}
+.. % The first chapter of \emph{Internet Programming with Python} also
+.. % examines some of the reasons for using Python.  The book is well worth
+.. % buying, but the publishers have made the first chapter available on
+.. % the Web.
+
+http://home.pacbell.net/ouster/scripting.html
+   John Ousterhout's white paper on scripting is a good argument for the utility of
+   scripting languages, though naturally enough, he emphasizes Tcl, the language he
+   developed.  Most of the arguments would apply to any scripting language.
+
+http://www.python.org/workshops/1997-10/proceedings/beazley.html
+   The authors, David M. Beazley and Peter S. Lomdahl,  describe their use of
+   Python at Los Alamos National Laboratory. It's another good example of how
+   Python can help get real work done. This quotation from the paper has been
+   echoed by many people:
+
+   .. epigraph::
+
+      Originally developed as a large monolithic application for massively parallel
+      processing systems, we have used Python to transform our application into a
+      flexible, highly modular, and extremely powerful system for performing
+      simulation, data analysis, and visualization. In addition, we describe how
+      Python has solved a number of important problems related to the development,
+      debugging, deployment, and maintenance of scientific software.
+
+http://pythonjournal.cognizor.com/pyj1/Everitt-Feit_interview98-V1.html
+   This interview with Andy Feit, discussing Infoseek's use of Python, can be used
+   to show that choosing Python didn't introduce any difficulties into a company's
+   development process, and provided some substantial benefits.
+
+.. % \term{\url{http://www.python.org/psa/Commercial.html}}
+.. % Robin Friedrich wrote this document on how to support Python's use in
+.. % commercial projects.
+
+http://www.python.org/workshops/1997-10/proceedings/stein.ps
+   For the 6th Python conference, Greg Stein presented a paper that traced Python's
+   adoption and usage at a startup called eShop, and later at Microsoft.
+
+http://www.opensource.org
+   Management may be doubtful of the reliability and usefulness of software that
+   wasn't written commercially.  This site presents arguments that show how open
+   source software can have considerable advantages over closed-source software.
+
+http://sunsite.unc.edu/LDP/HOWTO/mini/Advocacy.html
+   The Linux Advocacy mini-HOWTO was the inspiration for this document, and is also
+   well worth reading for general suggestions on winning acceptance for a new
+   technology, such as Linux or Python.  In general, you won't make much progress
+   by simply attacking existing systems and complaining about their inadequacies;
+   this often ends up looking like unfocused whining.  It's much better to point
+   out some of the many areas where Python is an improvement over other systems.
+

Added: doctools/trunk/Doc-26/howto/curses.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/curses.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,433 @@
+**********************************
+  Curses Programming with Python
+**********************************
+
+:Author: A.M. Kuchling, Eric S. Raymond
+:Release: 2.02
+
+
+.. topic:: Abstract
+
+   This document describes how to write text-mode programs with Python 2.x, using
+   the :mod:`curses` extension module to control the display.
+
+
+What is curses?
+===============
+
+The curses library supplies a terminal-independent screen-painting and keyboard-
+handling facility for text-based terminals; such terminals include VT100s, the
+Linux console, and the simulated terminal provided by X11 programs such as xterm
+and rxvt.  Display terminals support various control codes to perform common
+operations such as moving the cursor, scrolling the screen, and erasing areas.
+Different terminals use widely differing codes, and often have their own minor
+quirks.
+
+In a world of X displays, one might ask "why bother"?  It's true that character-
+cell display terminals are an obsolete technology, but there are niches in which
+being able to do fancy things with them are still valuable.  One is on small-
+footprint or embedded Unixes that  don't carry an X server.  Another is for
+tools like OS installers and kernel configurators that may have to run before X
+is available.
+
+The curses library hides all the details of different terminals, and provides
+the programmer with an abstraction of a display, containing multiple non-
+overlapping windows.  The contents of a window can be changed in various ways--
+adding text, erasing it, changing its appearance--and the curses library will
+automagically figure out what control codes need to be sent to the terminal to
+produce the right output.
+
+The curses library was originally written for BSD Unix; the later System V
+versions of Unix from AT&T added many enhancements and new functions. BSD curses
+is no longer maintained, having been replaced by ncurses, which is an open-
+source implementation of the AT&T interface.  If you're using an open-source
+Unix such as Linux or FreeBSD, your system almost certainly uses ncurses.  Since
+most current commercial Unix versions are based on System V code, all the
+functions described here will probably be available.  The older versions of
+curses carried by some proprietary Unixes may not support everything, though.
+
+No one has made a Windows port of the curses module.  On a Windows platform, try
+the Console module written by Fredrik Lundh.  The Console module provides
+cursor-addressable text output, plus full support for mouse and keyboard input,
+and is available from http://effbot.org/efflib/console.
+
+
+The Python curses module
+------------------------
+
+Thy Python module is a fairly simple wrapper over the C functions provided by
+curses; if you're already familiar with curses programming in C, it's really
+easy to transfer that knowledge to Python.  The biggest difference is that the
+Python interface makes things simpler, by merging different C functions such as
+:func:`addstr`, :func:`mvaddstr`, :func:`mvwaddstr`, into a single
+:meth:`addstr` method.  You'll see this covered in more detail later.
+
+This HOWTO is simply an introduction to writing text-mode programs with curses
+and Python. It doesn't attempt to be a complete guide to the curses API; for
+that, see the Python library guide's section on ncurses, and the C manual pages
+for ncurses.  It will, however, give you the basic ideas.
+
+
+Starting and ending a curses application
+========================================
+
+Before doing anything, curses must be initialized.  This is done by calling the
+:func:`initscr` function, which will determine the terminal type, send any
+required setup codes to the terminal, and create various internal data
+structures.  If successful, :func:`initscr` returns a window object representing
+the entire screen; this is usually called ``stdscr``, after the name of the
+corresponding C variable. ::
+
+   import curses
+   stdscr = curses.initscr()
+
+Usually curses applications turn off automatic echoing of keys to the screen, in
+order to be able to read keys and only display them under certain circumstances.
+This requires calling the :func:`noecho` function. ::
+
+   curses.noecho()
+
+Applications will also commonly need to react to keys instantly, without
+requiring the Enter key to be pressed; this is called cbreak mode, as opposed to
+the usual buffered input mode. ::
+
+   curses.cbreak()
+
+Terminals usually return special keys, such as the cursor keys or navigation
+keys such as Page Up and Home, as a multibyte escape sequence.  While you could
+write your application to expect such sequences and process them accordingly,
+curses can do it for you, returning a special value such as
+:const:`curses.KEY_LEFT`.  To get curses to do the job, you'll have to enable
+keypad mode. ::
+
+   stdscr.keypad(1)
+
+Terminating a curses application is much easier than starting one. You'll need
+to call  ::
+
+   curses.nocbreak(); stdscr.keypad(0); curses.echo()
+
+to reverse the curses-friendly terminal settings. Then call the :func:`endwin`
+function to restore the terminal to its original operating mode. ::
+
+   curses.endwin()
+
+A common problem when debugging a curses application is to get your terminal
+messed up when the application dies without restoring the terminal to its
+previous state.  In Python this commonly happens when your code is buggy and
+raises an uncaught exception.  Keys are no longer be echoed to the screen when
+you type them, for example, which makes using the shell difficult.
+
+In Python you can avoid these complications and make debugging much easier by
+importing the module :mod:`curses.wrapper`.  It supplies a :func:`wrapper`
+function that takes a callable.  It does the initializations described above,
+and also initializes colors if color support is present.  It then runs your
+provided callable and finally deinitializes appropriately.  The callable is
+called inside a try-catch clause which catches exceptions, performs curses
+deinitialization, and then passes the exception upwards.  Thus, your terminal
+won't be left in a funny state on exception.
+
+
+Windows and Pads
+================
+
+Windows are the basic abstraction in curses.  A window object represents a
+rectangular area of the screen, and supports various methods to display text,
+erase it, allow the user to input strings, and so forth.
+
+The ``stdscr`` object returned by the :func:`initscr` function is a window
+object that covers the entire screen.  Many programs may need only this single
+window, but you might wish to divide the screen into smaller windows, in order
+to redraw or clear them separately. The :func:`newwin` function creates a new
+window of a given size, returning the new window object. ::
+
+   begin_x = 20 ; begin_y = 7
+   height = 5 ; width = 40
+   win = curses.newwin(height, width, begin_y, begin_x)
+
+A word about the coordinate system used in curses: coordinates are always passed
+in the order *y,x*, and the top-left corner of a window is coordinate (0,0).
+This breaks a common convention for handling coordinates, where the *x*
+coordinate usually comes first.  This is an unfortunate difference from most
+other computer applications, but it's been part of curses since it was first
+written, and it's too late to change things now.
+
+When you call a method to display or erase text, the effect doesn't immediately
+show up on the display.  This is because curses was originally written with slow
+300-baud terminal connections in mind; with these terminals, minimizing the time
+required to redraw the screen is very important.  This lets curses accumulate
+changes to the screen, and display them in the most efficient manner.  For
+example, if your program displays some characters in a window, and then clears
+the window, there's no need to send the original characters because they'd never
+be visible.
+
+Accordingly, curses requires that you explicitly tell it to redraw windows,
+using the :func:`refresh` method of window objects.  In practice, this doesn't
+really complicate programming with curses much. Most programs go into a flurry
+of activity, and then pause waiting for a keypress or some other action on the
+part of the user.  All you have to do is to be sure that the screen has been
+redrawn before pausing to wait for user input, by simply calling
+``stdscr.refresh()`` or the :func:`refresh` method of some other relevant
+window.
+
+A pad is a special case of a window; it can be larger than the actual display
+screen, and only a portion of it displayed at a time. Creating a pad simply
+requires the pad's height and width, while refreshing a pad requires giving the
+coordinates of the on-screen area where a subsection of the pad will be
+displayed.   ::
+
+   pad = curses.newpad(100, 100)
+   #  These loops fill the pad with letters; this is
+   # explained in the next section
+   for y in range(0, 100):
+       for x in range(0, 100):
+           try: pad.addch(y,x, ord('a') + (x*x+y*y) % 26 )
+           except curses.error: pass
+
+   #  Displays a section of the pad in the middle of the screen
+   pad.refresh( 0,0, 5,5, 20,75)
+
+The :func:`refresh` call displays a section of the pad in the rectangle
+extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper
+left corner of the displayed section is coordinate (0,0) on the pad.  Beyond
+that difference, pads are exactly like ordinary windows and support the same
+methods.
+
+If you have multiple windows and pads on screen there is a more efficient way to
+go, which will prevent annoying screen flicker at refresh time.  Use the
+:meth:`noutrefresh` method of each window to update the data structure
+representing the desired state of the screen; then change the physical screen to
+match the desired state in one go with the function :func:`doupdate`.  The
+normal :meth:`refresh` method calls :func:`doupdate` as its last act.
+
+
+Displaying Text
+===============
+
+From a C programmer's point of view, curses may sometimes look like a twisty
+maze of functions, all subtly different.  For example, :func:`addstr` displays a
+string at the current cursor location in the ``stdscr`` window, while
+:func:`mvaddstr` moves to a given y,x coordinate first before displaying the
+string. :func:`waddstr` is just like :func:`addstr`, but allows specifying a
+window to use, instead of using ``stdscr`` by default. :func:`mvwaddstr` follows
+similarly.
+
+Fortunately the Python interface hides all these details; ``stdscr`` is a window
+object like any other, and methods like :func:`addstr` accept multiple argument
+forms.  Usually there are four different forms.
+
++---------------------------------+-----------------------------------------------+
+| Form                            | Description                                   |
++=================================+===============================================+
+| *str* or *ch*                   | Display the string *str* or character *ch* at |
+|                                 | the current position                          |
++---------------------------------+-----------------------------------------------+
+| *str* or *ch*, *attr*           | Display the string *str* or character *ch*,   |
+|                                 | using attribute *attr* at the current         |
+|                                 | position                                      |
++---------------------------------+-----------------------------------------------+
+| *y*, *x*, *str* or *ch*         | Move to position *y,x* within the window, and |
+|                                 | display *str* or *ch*                         |
++---------------------------------+-----------------------------------------------+
+| *y*, *x*, *str* or *ch*, *attr* | Move to position *y,x* within the window, and |
+|                                 | display *str* or *ch*, using attribute *attr* |
++---------------------------------+-----------------------------------------------+
+
+Attributes allow displaying text in highlighted forms, such as in boldface,
+underline, reverse code, or in color.  They'll be explained in more detail in
+the next subsection.
+
+The :func:`addstr` function takes a Python string as the value to be displayed,
+while the :func:`addch` functions take a character, which can be either a Python
+string of length 1 or an integer.  If it's a string, you're limited to
+displaying characters between 0 and 255.  SVr4 curses provides constants for
+extension characters; these constants are integers greater than 255.  For
+example, :const:`ACS_PLMINUS` is a +/- symbol, and :const:`ACS_ULCORNER` is the
+upper left corner of a box (handy for drawing borders).
+
+Windows remember where the cursor was left after the last operation, so if you
+leave out the *y,x* coordinates, the string or character will be displayed
+wherever the last operation left off.  You can also move the cursor with the
+:func:`move(y,x)` method.  Because some terminals always display a flashing
+cursor, you may want to ensure that the cursor is positioned in some location
+where it won't be distracting; it can be confusing to have the cursor blinking
+at some apparently random location.
+
+If your application doesn't need a blinking cursor at all, you can call
+:func:`curs_set(0)` to make it invisible.  Equivalently, and for compatibility
+with older curses versions, there's a :func:`leaveok(bool)` function.  When
+*bool* is true, the curses library will attempt to suppress the flashing cursor,
+and you won't need to worry about leaving it in odd locations.
+
+
+Attributes and Color
+--------------------
+
+Characters can be displayed in different ways.  Status lines in a text-based
+application are commonly shown in reverse video; a text viewer may need to
+highlight certain words.  curses supports this by allowing you to specify an
+attribute for each cell on the screen.
+
+An attribute is a integer, each bit representing a different attribute.  You can
+try to display text with multiple attribute bits set, but curses doesn't
+guarantee that all the possible combinations are available, or that they're all
+visually distinct.  That depends on the ability of the terminal being used, so
+it's safest to stick to the most commonly available attributes, listed here.
+
++----------------------+--------------------------------------+
+| Attribute            | Description                          |
++======================+======================================+
+| :const:`A_BLINK`     | Blinking text                        |
++----------------------+--------------------------------------+
+| :const:`A_BOLD`      | Extra bright or bold text            |
++----------------------+--------------------------------------+
+| :const:`A_DIM`       | Half bright text                     |
++----------------------+--------------------------------------+
+| :const:`A_REVERSE`   | Reverse-video text                   |
++----------------------+--------------------------------------+
+| :const:`A_STANDOUT`  | The best highlighting mode available |
++----------------------+--------------------------------------+
+| :const:`A_UNDERLINE` | Underlined text                      |
++----------------------+--------------------------------------+
+
+So, to display a reverse-video status line on the top line of the screen, you
+could code::
+
+   stdscr.addstr(0, 0, "Current mode: Typing mode",
+   	      curses.A_REVERSE)
+   stdscr.refresh()
+
+The curses library also supports color on those terminals that provide it, The
+most common such terminal is probably the Linux console, followed by color
+xterms.
+
+To use color, you must call the :func:`start_color` function soon after calling
+:func:`initscr`, to initialize the default color set (the
+:func:`curses.wrapper.wrapper` function does this automatically).  Once that's
+done, the :func:`has_colors` function returns TRUE if the terminal in use can
+actually display color.  (Note: curses uses the American spelling 'color',
+instead of the Canadian/British spelling 'colour'.  If you're used to the
+British spelling, you'll have to resign yourself to misspelling it for the sake
+of these functions.)
+
+The curses library maintains a finite number of color pairs, containing a
+foreground (or text) color and a background color.  You can get the attribute
+value corresponding to a color pair with the :func:`color_pair` function; this
+can be bitwise-OR'ed with other attributes such as :const:`A_REVERSE`, but
+again, such combinations are not guaranteed to work on all terminals.
+
+An example, which displays a line of text using color pair 1::
+
+   stdscr.addstr( "Pretty text", curses.color_pair(1) )
+   stdscr.refresh()
+
+As I said before, a color pair consists of a foreground and background color.
+:func:`start_color` initializes 8 basic colors when it activates color mode.
+They are: 0:black, 1:red, 2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and
+7:white.  The curses module defines named constants for each of these colors:
+:const:`curses.COLOR_BLACK`, :const:`curses.COLOR_RED`, and so forth.
+
+The :func:`init_pair(n, f, b)` function changes the definition of color pair
+*n*, to foreground color f and background color b.  Color pair 0 is hard-wired
+to white on black, and cannot be changed.
+
+Let's put all this together. To change color 1 to red text on a white
+background, you would call::
+
+   curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE)
+
+When you change a color pair, any text already displayed using that color pair
+will change to the new colors.  You can also display new text in this color
+with::
+
+   stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1) )
+
+Very fancy terminals can change the definitions of the actual colors to a given
+RGB value.  This lets you change color 1, which is usually red, to purple or
+blue or any other color you like.  Unfortunately, the Linux console doesn't
+support this, so I'm unable to try it out, and can't provide any examples.  You
+can check if your terminal can do this by calling :func:`can_change_color`,
+which returns TRUE if the capability is there.  If you're lucky enough to have
+such a talented terminal, consult your system's man pages for more information.
+
+
+User Input
+==========
+
+The curses library itself offers only very simple input mechanisms. Python's
+support adds a text-input widget that makes up some of the lack.
+
+The most common way to get input to a window is to use its :meth:`getch` method.
+:meth:`getch` pauses and waits for the user to hit a key, displaying it if
+:func:`echo` has been called earlier.  You can optionally specify a coordinate
+to which the cursor should be moved before pausing.
+
+It's possible to change this behavior with the method :meth:`nodelay`. After
+:meth:`nodelay(1)`, :meth:`getch` for the window becomes non-blocking and
+returns ``curses.ERR`` (a value of -1) when no input is ready.  There's also a
+:func:`halfdelay` function, which can be used to (in effect) set a timer on each
+:meth:`getch`; if no input becomes available within the number of milliseconds
+specified as the argument to :func:`halfdelay`, curses raises an exception.
+
+The :meth:`getch` method returns an integer; if it's between 0 and 255, it
+represents the ASCII code of the key pressed.  Values greater than 255 are
+special keys such as Page Up, Home, or the cursor keys. You can compare the
+value returned to constants such as :const:`curses.KEY_PPAGE`,
+:const:`curses.KEY_HOME`, or :const:`curses.KEY_LEFT`.  Usually the main loop of
+your program will look something like this::
+
+   while 1:
+       c = stdscr.getch()
+       if c == ord('p'): PrintDocument()
+       elif c == ord('q'): break  # Exit the while()
+       elif c == curses.KEY_HOME: x = y = 0
+
+The :mod:`curses.ascii` module supplies ASCII class membership functions that
+take either integer or 1-character-string arguments; these may be useful in
+writing more readable tests for your command interpreters.  It also supplies
+conversion functions  that take either integer or 1-character-string arguments
+and return the same type.  For example, :func:`curses.ascii.ctrl` returns the
+control character corresponding to its argument.
+
+There's also a method to retrieve an entire string, :const:`getstr()`.  It isn't
+used very often, because its functionality is quite limited; the only editing
+keys available are the backspace key and the Enter key, which terminates the
+string.  It can optionally be limited to a fixed number of characters. ::
+
+   curses.echo()            # Enable echoing of characters
+
+   # Get a 15-character string, with the cursor on the top line 
+   s = stdscr.getstr(0,0, 15)  
+
+The Python :mod:`curses.textpad` module supplies something better. With it, you
+can turn a window into a text box that supports an Emacs-like set of
+keybindings.  Various methods of :class:`Textbox` class support editing with
+input validation and gathering the edit results either with or without trailing
+spaces.   See the library documentation on :mod:`curses.textpad` for the
+details.
+
+
+For More Information
+====================
+
+This HOWTO didn't cover some advanced topics, such as screen-scraping or
+capturing mouse events from an xterm instance.  But the Python library page for
+the curses modules is now pretty complete.  You should browse it next.
+
+If you're in doubt about the detailed behavior of any of the ncurses entry
+points, consult the manual pages for your curses implementation, whether it's
+ncurses or a proprietary Unix vendor's.  The manual pages will document any
+quirks, and provide complete lists of all the functions, attributes, and
+:const:`ACS_\*` characters available to you.
+
+Because the curses API is so large, some functions aren't supported in the
+Python interface, not because they're difficult to implement, but because no one
+has needed them yet.  Feel free to add them and then submit a patch.  Also, we
+don't yet have support for the menus or panels libraries associated with
+ncurses; feel free to add that.
+
+If you write an interesting little program, feel free to contribute it as
+another demo.  We can always use more of them!
+
+The ncurses FAQ: http://dickey.his.com/ncurses/ncurses.faq.html
+

Added: doctools/trunk/Doc-26/howto/doanddont.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/doanddont.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,308 @@
+************************************
+  Idioms and Anti-Idioms in Python  
+************************************
+
+:Author: Moshe Zadka
+
+This document is placed in the public doman.
+
+
+.. topic:: Abstract
+
+   This document can be considered a companion to the tutorial. It shows how to use
+   Python, and even more importantly, how *not* to use Python.
+
+
+Language Constructs You Should Not Use
+======================================
+
+While Python has relatively few gotchas compared to other languages, it still
+has some constructs which are only useful in corner cases, or are plain
+dangerous.
+
+
+from module import \*
+---------------------
+
+
+Inside Function Definitions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``from module import *`` is *invalid* inside function definitions. While many
+versions of Python do not check for the invalidity, it does not make it more
+valid, no more then having a smart lawyer makes a man innocent. Do not use it
+like that ever. Even in versions where it was accepted, it made the function
+execution slower, because the compiler could not be certain which names are
+local and which are global. In Python 2.1 this construct causes warnings, and
+sometimes even errors.
+
+
+At Module Level
+^^^^^^^^^^^^^^^
+
+While it is valid to use ``from module import *`` at module level it is usually
+a bad idea. For one, this loses an important property Python otherwise has ---
+you can know where each toplevel name is defined by a simple "search" function
+in your favourite editor. You also open yourself to trouble in the future, if
+some module grows additional functions or classes.
+
+One of the most awful question asked on the newsgroup is why this code::
+
+   f = open("www")
+   f.read()
+
+does not work. Of course, it works just fine (assuming you have a file called
+"www".) But it does not work if somewhere in the module, the statement ``from os
+import *`` is present. The :mod:`os` module has a function called :func:`open`
+which returns an integer. While it is very useful, shadowing builtins is one of
+its least useful properties.
+
+Remember, you can never know for sure what names a module exports, so either
+take what you need --- ``from module import name1, name2``, or keep them in the
+module and access on a per-need basis ---  ``import module;print module.name``.
+
+
+When It Is Just Fine
+^^^^^^^^^^^^^^^^^^^^
+
+There are situations in which ``from module import *`` is just fine:
+
+* The interactive prompt. For example, ``from math import *`` makes Python an
+  amazing scientific calculator.
+
+* When extending a module in C with a module in Python.
+
+* When the module advertises itself as ``from import *`` safe.
+
+
+Unadorned :keyword:`exec`, :func:`execfile` and friends
+-------------------------------------------------------
+
+The word "unadorned" refers to the use without an explicit dictionary, in which
+case those constructs evaluate code in the *current* environment. This is
+dangerous for the same reasons ``from import *`` is dangerous --- it might step
+over variables you are counting on and mess up things for the rest of your code.
+Simply do not do that.
+
+Bad examples::
+
+   >>> for name in sys.argv[1:]:
+   >>>     exec "%s=1" % name
+   >>> def func(s, **kw):
+   >>>     for var, val in kw.items():
+   >>>         exec "s.%s=val" % var  # invalid!
+   >>> execfile("handler.py")
+   >>> handle()
+
+Good examples::
+
+   >>> d = {}
+   >>> for name in sys.argv[1:]:
+   >>>     d[name] = 1
+   >>> def func(s, **kw):
+   >>>     for var, val in kw.items():
+   >>>         setattr(s, var, val)
+   >>> d={}
+   >>> execfile("handle.py", d, d)
+   >>> handle = d['handle']
+   >>> handle()
+
+
+from module import name1, name2
+-------------------------------
+
+This is a "don't" which is much weaker then the previous "don't"s but is still
+something you should not do if you don't have good reasons to do that. The
+reason it is usually bad idea is because you suddenly have an object which lives
+in two seperate namespaces. When the binding in one namespace changes, the
+binding in the other will not, so there will be a discrepancy between them. This
+happens when, for example, one module is reloaded, or changes the definition of
+a function at runtime.
+
+Bad example::
+
+   # foo.py
+   a = 1
+
+   # bar.py
+   from foo import a
+   if something():
+       a = 2 # danger: foo.a != a 
+
+Good example::
+
+   # foo.py
+   a = 1
+
+   # bar.py
+   import foo
+   if something():
+       foo.a = 2
+
+
+except:
+-------
+
+Python has the ``except:`` clause, which catches all exceptions. Since *every*
+error in Python raises an exception, this makes many programming errors look
+like runtime problems, and hinders the debugging process.
+
+The following code shows a great example::
+
+   try:
+       foo = opne("file") # misspelled "open"
+   except:
+       sys.exit("could not open file!")
+
+The second line triggers a :exc:`NameError` which is caught by the except
+clause. The program will exit, and you will have no idea that this has nothing
+to do with the readability of ``"file"``.
+
+The example above is better written ::
+
+   try:
+       foo = opne("file") # will be changed to "open" as soon as we run it
+   except IOError:
+       sys.exit("could not open file")
+
+There are some situations in which the ``except:`` clause is useful: for
+example, in a framework when running callbacks, it is good not to let any
+callback disturb the framework.
+
+
+Exceptions
+==========
+
+Exceptions are a useful feature of Python. You should learn to raise them
+whenever something unexpected occurs, and catch them only where you can do
+something about them.
+
+The following is a very popular anti-idiom ::
+
+   def get_status(file):
+       if not os.path.exists(file):
+           print "file not found"
+           sys.exit(1)
+       return open(file).readline()
+
+Consider the case the file gets deleted between the time the call to
+:func:`os.path.exists` is made and the time :func:`open` is called. That means
+the last line will throw an :exc:`IOError`. The same would happen if *file*
+exists but has no read permission. Since testing this on a normal machine on
+existing and non-existing files make it seem bugless, that means in testing the
+results will seem fine, and the code will get shipped. Then an unhandled
+:exc:`IOError` escapes to the user, who has to watch the ugly traceback.
+
+Here is a better way to do it. ::
+
+   def get_status(file):
+       try:
+           return open(file).readline()
+       except (IOError, OSError):
+           print "file not found"
+           sys.exit(1)
+
+In this version, \*either\* the file gets opened and the line is read (so it
+works even on flaky NFS or SMB connections), or the message is printed and the
+application aborted.
+
+Still, :func:`get_status` makes too many assumptions --- that it will only be
+used in a short running script, and not, say, in a long running server. Sure,
+the caller could do something like ::
+
+   try:
+       status = get_status(log)
+   except SystemExit:
+       status = None
+
+So, try to make as few ``except`` clauses in your code --- those will usually be
+a catch-all in the :func:`main`, or inside calls which should always succeed.
+
+So, the best version is probably ::
+
+   def get_status(file):
+       return open(file).readline()
+
+The caller can deal with the exception if it wants (for example, if it  tries
+several files in a loop), or just let the exception filter upwards to *its*
+caller.
+
+The last version is not very good either --- due to implementation details, the
+file would not be closed when an exception is raised until the handler finishes,
+and perhaps not at all in non-C implementations (e.g., Jython). ::
+
+   def get_status(file):
+       fp = open(file)
+       try:
+           return fp.readline()
+       finally:
+           fp.close()
+
+
+Using the Batteries
+===================
+
+Every so often, people seem to be writing stuff in the Python library again,
+usually poorly. While the occasional module has a poor interface, it is usually
+much better to use the rich standard library and data types that come with
+Python then inventing your own.
+
+A useful module very few people know about is :mod:`os.path`. It  always has the
+correct path arithmetic for your operating system, and will usually be much
+better then whatever you come up with yourself.
+
+Compare::
+
+   # ugh!
+   return dir+"/"+file
+   # better
+   return os.path.join(dir, file)
+
+More useful functions in :mod:`os.path`: :func:`basename`,  :func:`dirname` and
+:func:`splitext`.
+
+There are also many useful builtin functions people seem not to be aware of for
+some reason: :func:`min` and :func:`max` can find the minimum/maximum of any
+sequence with comparable semantics, for example, yet many people write their own
+:func:`max`/:func:`min`. Another highly useful function is :func:`reduce`. A
+classical use of :func:`reduce` is something like ::
+
+   import sys, operator
+   nums = map(float, sys.argv[1:])
+   print reduce(operator.add, nums)/len(nums)
+
+This cute little script prints the average of all numbers given on the command
+line. The :func:`reduce` adds up all the numbers, and the rest is just some pre-
+and postprocessing.
+
+On the same note, note that :func:`float`, :func:`int` and :func:`long` all
+accept arguments of type string, and so are suited to parsing --- assuming you
+are ready to deal with the :exc:`ValueError` they raise.
+
+
+Using Backslash to Continue Statements
+======================================
+
+Since Python treats a newline as a statement terminator, and since statements
+are often more then is comfortable to put in one line, many people do::
+
+   if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
+      calculate_number(10, 20) != forbulate(500, 360):
+         pass
+
+You should realize that this is dangerous: a stray space after the ``XXX`` would
+make this line wrong, and stray spaces are notoriously hard to see in editors.
+In this case, at least it would be a syntax error, but if the code was::
+
+   value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
+           + calculate_number(10, 20)*forbulate(500, 360)
+
+then it would just be subtly wrong.
+
+It is usually much better to use the implicit continuation inside parenthesis:
+
+This version is bulletproof::
+
+   value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9] 
+           + calculate_number(10, 20)*forbulate(500, 360))
+

Added: doctools/trunk/Doc-26/howto/functional.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/functional.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,1400 @@
+********************************
+  Functional Programming HOWTO
+********************************
+
+:Author: \A. M. Kuchling
+:Release: 0.30
+
+(This is a first draft.  Please send comments/error reports/suggestions to
+amk at amk.ca.  This URL is probably not going to be the final location of the
+document, so be careful about linking to it -- you may want to add a
+disclaimer.)
+
+In this document, we'll take a tour of Python's features suitable for
+implementing programs in a functional style.  After an introduction to the
+concepts of functional programming, we'll look at language features such as
+iterators and generators and relevant library modules such as :mod:`itertools`
+and :mod:`functools`.
+
+
+Introduction
+============
+
+This section explains the basic concept of functional programming; if you're
+just interested in learning about Python language features, skip to the next
+section.
+
+Programming languages support decomposing problems in several different ways:
+
+* Most programming languages are **procedural**: programs are lists of
+  instructions that tell the computer what to do with the program's input.  C,
+  Pascal, and even Unix shells are procedural languages.
+
+* In **declarative** languages, you write a specification that describes the
+  problem to be solved, and the language implementation figures out how to
+  perform the computation efficiently.  SQL is the declarative language you're
+  most likely to be familiar with; a SQL query describes the data set you want
+  to retrieve, and the SQL engine decides whether to scan tables or use indexes,
+  which subclauses should be performed first, etc.
+
+* **Object-oriented** programs manipulate collections of objects.  Objects have
+  internal state and support methods that query or modify this internal state in
+  some way. Smalltalk and Java are object-oriented languages.  C++ and Python
+  are languages that support object-oriented programming, but don't force the
+  use of object-oriented features.
+
+* **Functional** programming decomposes a problem into a set of functions.
+  Ideally, functions only take inputs and produce outputs, and don't have any
+  internal state that affects the output produced for a given input.  Well-known
+  functional languages include the ML family (Standard ML, OCaml, and other
+  variants) and Haskell.
+
+The designers of some computer languages have chosen one approach to programming
+that's emphasized.  This often makes it difficult to write programs that use a
+different approach.  Other languages are multi-paradigm languages that support
+several different approaches.  Lisp, C++, and Python are multi-paradigm; you can
+write programs or libraries that are largely procedural, object-oriented, or
+functional in all of these languages.  In a large program, different sections
+might be written using different approaches; the GUI might be object-oriented
+while the processing logic is procedural or functional, for example.
+
+In a functional program, input flows through a set of functions. Each function
+operates on its input and produces some output.  Functional style frowns upon
+functions with side effects that modify internal state or make other changes
+that aren't visible in the function's return value.  Functions that have no side
+effects at all are called **purely functional**.  Avoiding side effects means
+not using data structures that get updated as a program runs; every function's
+output must only depend on its input.
+
+Some languages are very strict about purity and don't even have assignment
+statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all
+side effects.  Printing to the screen or writing to a disk file are side
+effects, for example.  For example, in Python a ``print`` statement or a
+``time.sleep(1)`` both return no useful value; they're only called for their
+side effects of sending some text to the screen or pausing execution for a
+second.
+
+Python programs written in functional style usually won't go to the extreme of
+avoiding all I/O or all assignments; instead, they'll provide a
+functional-appearing interface but will use non-functional features internally.
+For example, the implementation of a function will still use assignments to
+local variables, but won't modify global variables or have other side effects.
+
+Functional programming can be considered the opposite of object-oriented
+programming.  Objects are little capsules containing some internal state along
+with a collection of method calls that let you modify this state, and programs
+consist of making the right set of state changes.  Functional programming wants
+to avoid state changes as much as possible and works with data flowing between
+functions.  In Python you might combine the two approaches by writing functions
+that take and return instances representing objects in your application (e-mail
+messages, transactions, etc.).
+
+Functional design may seem like an odd constraint to work under.  Why should you
+avoid objects and side effects?  There are theoretical and practical advantages
+to the functional style:
+
+* Formal provability.
+* Modularity.
+* Composability.
+* Ease of debugging and testing.
+
+Formal provability
+------------------
+
+A theoretical benefit is that it's easier to construct a mathematical proof that
+a functional program is correct.
+
+For a long time researchers have been interested in finding ways to
+mathematically prove programs correct.  This is different from testing a program
+on numerous inputs and concluding that its output is usually correct, or reading
+a program's source code and concluding that the code looks right; the goal is
+instead a rigorous proof that a program produces the right result for all
+possible inputs.
+
+The technique used to prove programs correct is to write down **invariants**,
+properties of the input data and of the program's variables that are always
+true.  For each line of code, you then show that if invariants X and Y are true
+**before** the line is executed, the slightly different invariants X' and Y' are
+true **after** the line is executed.  This continues until you reach the end of
+the program, at which point the invariants should match the desired conditions
+on the program's output.
+
+Functional programming's avoidance of assignments arose because assignments are
+difficult to handle with this technique; assignments can break invariants that
+were true before the assignment without producing any new invariants that can be
+propagated onward.
+
+Unfortunately, proving programs correct is largely impractical and not relevant
+to Python software. Even trivial programs require proofs that are several pages
+long; the proof of correctness for a moderately complicated program would be
+enormous, and few or none of the programs you use daily (the Python interpreter,
+your XML parser, your web browser) could be proven correct.  Even if you wrote
+down or generated a proof, there would then be the question of verifying the
+proof; maybe there's an error in it, and you wrongly believe you've proved the
+program correct.
+
+Modularity
+----------
+
+A more practical benefit of functional programming is that it forces you to
+break apart your problem into small pieces.  Programs are more modular as a
+result.  It's easier to specify and write a small function that does one thing
+than a large function that performs a complicated transformation.  Small
+functions are also easier to read and to check for errors.
+
+
+Ease of debugging and testing 
+-----------------------------
+
+Testing and debugging a functional-style program is easier.
+
+Debugging is simplified because functions are generally small and clearly
+specified.  When a program doesn't work, each function is an interface point
+where you can check that the data are correct.  You can look at the intermediate
+inputs and outputs to quickly isolate the function that's responsible for a bug.
+
+Testing is easier because each function is a potential subject for a unit test.
+Functions don't depend on system state that needs to be replicated before
+running a test; instead you only have to synthesize the right input and then
+check that the output matches expectations.
+
+
+
+Composability
+-------------
+
+As you work on a functional-style program, you'll write a number of functions
+with varying inputs and outputs.  Some of these functions will be unavoidably
+specialized to a particular application, but others will be useful in a wide
+variety of programs.  For example, a function that takes a directory path and
+returns all the XML files in the directory, or a function that takes a filename
+and returns its contents, can be applied to many different situations.
+
+Over time you'll form a personal library of utilities.  Often you'll assemble
+new programs by arranging existing functions in a new configuration and writing
+a few functions specialized for the current task.
+
+
+
+Iterators
+=========
+
+I'll start by looking at a Python language feature that's an important
+foundation for writing functional-style programs: iterators.
+
+An iterator is an object representing a stream of data; this object returns the
+data one element at a time.  A Python iterator must support a method called
+``next()`` that takes no arguments and always returns the next element of the
+stream.  If there are no more elements in the stream, ``next()`` must raise the
+``StopIteration`` exception.  Iterators don't have to be finite, though; it's
+perfectly reasonable to write an iterator that produces an infinite stream of
+data.
+
+The built-in :func:`iter` function takes an arbitrary object and tries to return
+an iterator that will return the object's contents or elements, raising
+:exc:`TypeError` if the object doesn't support iteration.  Several of Python's
+built-in data types support iteration, the most common being lists and
+dictionaries.  An object is called an **iterable** object if you can get an
+iterator for it.
+
+You can experiment with the iteration interface manually::
+
+    >>> L = [1,2,3]
+    >>> it = iter(L)
+    >>> print it
+    <iterator object at 0x8116870>
+    >>> it.next()
+    1
+    >>> it.next()
+    2
+    >>> it.next()
+    3
+    >>> it.next()
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    StopIteration
+    >>>      
+
+Python expects iterable objects in several different contexts, the most
+important being the ``for`` statement.  In the statement ``for X in Y``, Y must
+be an iterator or some object for which ``iter()`` can create an iterator.
+These two statements are equivalent::
+
+        for i in iter(obj):
+            print i
+
+        for i in obj:
+            print i
+
+Iterators can be materialized as lists or tuples by using the :func:`list` or
+:func:`tuple` constructor functions::
+
+    >>> L = [1,2,3]
+    >>> iterator = iter(L)
+    >>> t = tuple(iterator)
+    >>> t
+    (1, 2, 3)
+
+Sequence unpacking also supports iterators: if you know an iterator will return
+N elements, you can unpack them into an N-tuple::
+
+    >>> L = [1,2,3]
+    >>> iterator = iter(L)
+    >>> a,b,c = iterator
+    >>> a,b,c
+    (1, 2, 3)
+
+Built-in functions such as :func:`max` and :func:`min` can take a single
+iterator argument and will return the largest or smallest element.  The ``"in"``
+and ``"not in"`` operators also support iterators: ``X in iterator`` is true if
+X is found in the stream returned by the iterator.  You'll run into obvious
+problems if the iterator is infinite; ``max()``, ``min()``, and ``"not in"``
+will never return, and if the element X never appears in the stream, the
+``"in"`` operator won't return either.
+
+Note that you can only go forward in an iterator; there's no way to get the
+previous element, reset the iterator, or make a copy of it.  Iterator objects
+can optionally provide these additional capabilities, but the iterator protocol
+only specifies the ``next()`` method.  Functions may therefore consume all of
+the iterator's output, and if you need to do something different with the same
+stream, you'll have to create a new iterator.
+
+
+
+Data Types That Support Iterators
+---------------------------------
+
+We've already seen how lists and tuples support iterators.  In fact, any Python
+sequence type, such as strings, will automatically support creation of an
+iterator.
+
+Calling :func:`iter` on a dictionary returns an iterator that will loop over the
+dictionary's keys::
+
+    >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
+    ...      'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
+    >>> for key in m:
+    ...     print key, m[key]
+    Mar 3
+    Feb 2
+    Aug 8
+    Sep 9
+    May 5
+    Jun 6
+    Jul 7
+    Jan 1
+    Apr 4
+    Nov 11
+    Dec 12
+    Oct 10
+
+Note that the order is essentially random, because it's based on the hash
+ordering of the objects in the dictionary.
+
+Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries
+have methods that return other iterators.  If you want to iterate over keys,
+values, or key/value pairs, you can explicitly call the ``iterkeys()``,
+``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator.
+
+The :func:`dict` constructor can accept an iterator that returns a finite stream
+of ``(key, value)`` tuples::
+
+    >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
+    >>> dict(iter(L))
+    {'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
+
+Files also support iteration by calling the ``readline()`` method until there
+are no more lines in the file.  This means you can read each line of a file like
+this::
+
+    for line in file:
+        # do something for each line
+        ...
+
+Sets can take their contents from an iterable and let you iterate over the set's
+elements::
+
+    S = set((2, 3, 5, 7, 11, 13))
+    for i in S:
+        print i
+
+
+
+Generator expressions and list comprehensions
+=============================================
+
+Two common operations on an iterator's output are 1) performing some operation
+for every element, 2) selecting a subset of elements that meet some condition.
+For example, given a list of strings, you might want to strip off trailing
+whitespace from each line or extract all the strings containing a given
+substring.
+
+List comprehensions and generator expressions (short form: "listcomps" and
+"genexps") are a concise notation for such operations, borrowed from the
+functional programming language Haskell (http://www.haskell.org).  You can strip
+all the whitespace from a stream of strings with the following code::
+
+        line_list = ['  line 1\n', 'line 2  \n', ...]
+
+        # Generator expression -- returns iterator
+        stripped_iter = (line.strip() for line in line_list)
+
+        # List comprehension -- returns list
+        stripped_list = [line.strip() for line in line_list]
+
+You can select only certain elements by adding an ``"if"`` condition::
+
+        stripped_list = [line.strip() for line in line_list
+                         if line != ""]
+
+With a list comprehension, you get back a Python list; ``stripped_list`` is a
+list containing the resulting lines, not an iterator.  Generator expressions
+return an iterator that computes the values as necessary, not needing to
+materialize all the values at once.  This means that list comprehensions aren't
+useful if you're working with iterators that return an infinite stream or a very
+large amount of data.  Generator expressions are preferable in these situations.
+
+Generator expressions are surrounded by parentheses ("()") and list
+comprehensions are surrounded by square brackets ("[]").  Generator expressions
+have the form::
+
+    ( expression for expr in sequence1 
+                 if condition1
+                 for expr2 in sequence2
+                 if condition2
+                 for expr3 in sequence3 ...
+                 if condition3
+                 for exprN in sequenceN
+                 if conditionN )
+
+Again, for a list comprehension only the outside brackets are different (square
+brackets instead of parentheses).
+
+The elements of the generated output will be the successive values of
+``expression``.  The ``if`` clauses are all optional; if present, ``expression``
+is only evaluated and added to the result when ``condition`` is true.
+
+Generator expressions always have to be written inside parentheses, but the
+parentheses signalling a function call also count.  If you want to create an
+iterator that will be immediately passed to a function you can write::
+
+        obj_total = sum(obj.count for obj in list_all_objects())
+
+The ``for...in`` clauses contain the sequences to be iterated over.  The
+sequences do not have to be the same length, because they are iterated over from
+left to right, **not** in parallel.  For each element in ``sequence1``,
+``sequence2`` is looped over from the beginning.  ``sequence3`` is then looped
+over for each resulting pair of elements from ``sequence1`` and ``sequence2``.
+
+To put it another way, a list comprehension or generator expression is
+equivalent to the following Python code::
+
+    for expr1 in sequence1:
+        if not (condition1):
+            continue   # Skip this element
+        for expr2 in sequence2:
+            if not (condition2):
+                continue    # Skip this element
+            ...
+            for exprN in sequenceN:
+                 if not (conditionN):
+                     continue   # Skip this element
+
+                 # Output the value of 
+                 # the expression.
+
+This means that when there are multiple ``for...in`` clauses but no ``if``
+clauses, the length of the resulting output will be equal to the product of the
+lengths of all the sequences.  If you have two lists of length 3, the output
+list is 9 elements long::
+
+    seq1 = 'abc'
+    seq2 = (1,2,3)
+    >>> [ (x,y) for x in seq1 for y in seq2]
+    [('a', 1), ('a', 2), ('a', 3), 
+     ('b', 1), ('b', 2), ('b', 3), 
+     ('c', 1), ('c', 2), ('c', 3)]
+
+To avoid introducing an ambiguity into Python's grammar, if ``expression`` is
+creating a tuple, it must be surrounded with parentheses.  The first list
+comprehension below is a syntax error, while the second one is correct::
+
+    # Syntax error
+    [ x,y for x in seq1 for y in seq2]
+    # Correct
+    [ (x,y) for x in seq1 for y in seq2]
+
+
+Generators
+==========
+
+Generators are a special class of functions that simplify the task of writing
+iterators.  Regular functions compute a value and return it, but generators
+return an iterator that returns a stream of values.
+
+You're doubtless familiar with how regular function calls work in Python or C.
+When you call a function, it gets a private namespace where its local variables
+are created.  When the function reaches a ``return`` statement, the local
+variables are destroyed and the value is returned to the caller.  A later call
+to the same function creates a new private namespace and a fresh set of local
+variables. But, what if the local variables weren't thrown away on exiting a
+function?  What if you could later resume the function where it left off?  This
+is what generators provide; they can be thought of as resumable functions.
+
+Here's the simplest example of a generator function::
+
+    def generate_ints(N):
+        for i in range(N):
+            yield i
+
+Any function containing a ``yield`` keyword is a generator function; this is
+detected by Python's bytecode compiler which compiles the function specially as
+a result.
+
+When you call a generator function, it doesn't return a single value; instead it
+returns a generator object that supports the iterator protocol.  On executing
+the ``yield`` expression, the generator outputs the value of ``i``, similar to a
+``return`` statement.  The big difference between ``yield`` and a ``return``
+statement is that on reaching a ``yield`` the generator's state of execution is
+suspended and local variables are preserved.  On the next call to the
+generator's ``.next()`` method, the function will resume executing.
+
+Here's a sample usage of the ``generate_ints()`` generator::
+
+    >>> gen = generate_ints(3)
+    >>> gen
+    <generator object at 0x8117f90>
+    >>> gen.next()
+    0
+    >>> gen.next()
+    1
+    >>> gen.next()
+    2
+    >>> gen.next()
+    Traceback (most recent call last):
+      File "stdin", line 1, in ?
+      File "stdin", line 2, in generate_ints
+    StopIteration
+
+You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
+generate_ints(3)``.
+
+Inside a generator function, the ``return`` statement can only be used without a
+value, and signals the end of the procession of values; after executing a
+``return`` the generator cannot return any further values.  ``return`` with a
+value, such as ``return 5``, is a syntax error inside a generator function.  The
+end of the generator's results can also be indicated by raising
+``StopIteration`` manually, or by just letting the flow of execution fall off
+the bottom of the function.
+
+You could achieve the effect of generators manually by writing your own class
+and storing all the local variables of the generator as instance variables.  For
+example, returning a list of integers could be done by setting ``self.count`` to
+0, and having the ``next()`` method increment ``self.count`` and return it.
+However, for a moderately complicated generator, writing a corresponding class
+can be much messier.
+
+The test suite included with Python's library, ``test_generators.py``, contains
+a number of more interesting examples.  Here's one generator that implements an
+in-order traversal of a tree using generators recursively.
+
+::
+
+    # A recursive generator that generates Tree leaves in in-order.
+    def inorder(t):
+        if t:
+            for x in inorder(t.left):
+                yield x
+
+            yield t.label
+
+            for x in inorder(t.right):
+                yield x
+
+Two other examples in ``test_generators.py`` produce solutions for the N-Queens
+problem (placing N queens on an NxN chess board so that no queen threatens
+another) and the Knight's Tour (finding a route that takes a knight to every
+square of an NxN chessboard without visiting any square twice).
+
+
+
+Passing values into a generator
+-------------------------------
+
+In Python 2.4 and earlier, generators only produced output.  Once a generator's
+code was invoked to create an iterator, there was no way to pass any new
+information into the function when its execution is resumed.  You could hack
+together this ability by making the generator look at a global variable or by
+passing in some mutable object that callers then modify, but these approaches
+are messy.
+
+In Python 2.5 there's a simple way to pass values into a generator.
+:keyword:`yield` became an expression, returning a value that can be assigned to
+a variable or otherwise operated on::
+
+    val = (yield i)
+
+I recommend that you **always** put parentheses around a ``yield`` expression
+when you're doing something with the returned value, as in the above example.
+The parentheses aren't always necessary, but it's easier to always add them
+instead of having to remember when they're needed.
+
+(PEP 342 explains the exact rules, which are that a ``yield``-expression must
+always be parenthesized except when it occurs at the top-level expression on the
+right-hand side of an assignment.  This means you can write ``val = yield i``
+but have to use parentheses when there's an operation, as in ``val = (yield i)
++ 12``.)
+
+Values are sent into a generator by calling its ``send(value)`` method.  This
+method resumes the generator's code and the ``yield`` expression returns the
+specified value.  If the regular ``next()`` method is called, the ``yield``
+returns ``None``.
+
+Here's a simple counter that increments by 1 and allows changing the value of
+the internal counter.
+
+::
+
+    def counter (maximum):
+        i = 0
+        while i < maximum:
+            val = (yield i)
+            # If value provided, change counter
+            if val is not None:
+                i = val
+            else:
+                i += 1
+
+And here's an example of changing the counter:
+
+    >>> it = counter(10)
+    >>> print it.next()
+    0
+    >>> print it.next()
+    1
+    >>> print it.send(8)
+    8
+    >>> print it.next()
+    9
+    >>> print it.next()
+    Traceback (most recent call last):
+      File ``t.py'', line 15, in ?
+        print it.next()
+    StopIteration
+
+Because ``yield`` will often be returning ``None``, you should always check for
+this case.  Don't just use its value in expressions unless you're sure that the
+``send()`` method will be the only method used resume your generator function.
+
+In addition to ``send()``, there are two other new methods on generators:
+
+* ``throw(type, value=None, traceback=None)`` is used to raise an exception
+  inside the generator; the exception is raised by the ``yield`` expression
+  where the generator's execution is paused.
+
+* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to
+  terminate the iteration.  On receiving this exception, the generator's code
+  must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the
+  exception and doing anything else is illegal and will trigger a
+  :exc:`RuntimeError`.  ``close()`` will also be called by Python's garbage
+  collector when the generator is garbage-collected.
+
+  If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest
+  using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`.
+
+The cumulative effect of these changes is to turn generators from one-way
+producers of information into both producers and consumers.
+
+Generators also become **coroutines**, a more generalized form of subroutines.
+Subroutines are entered at one point and exited at another point (the top of the
+function, and a ``return`` statement), but coroutines can be entered, exited,
+and resumed at many different points (the ``yield`` statements).
+
+
+Built-in functions
+==================
+
+Let's look in more detail at built-in functions often used with iterators.
+
+Two Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
+obsolete; they duplicate the features of list comprehensions but return actual
+lists instead of iterators.
+
+``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
+f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
+
+::
+
+    def upper(s):
+        return s.upper()
+    map(upper, ['sentence', 'fragment']) =>
+      ['SENTENCE', 'FRAGMENT']
+
+    [upper(s) for s in ['sentence', 'fragment']] =>
+      ['SENTENCE', 'FRAGMENT']
+
+As shown above, you can achieve the same effect with a list comprehension.  The
+:func:`itertools.imap` function does the same thing but can handle infinite
+iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
+
+``filter(predicate, iter)`` returns a list that contains all the sequence
+elements that meet a certain condition, and is similarly duplicated by list
+comprehensions.  A **predicate** is a function that returns the truth value of
+some condition; for use with :func:`filter`, the predicate must take a single
+value.
+
+::
+
+    def is_even(x):
+        return (x % 2) == 0
+
+    filter(is_even, range(10)) =>
+      [0, 2, 4, 6, 8]
+
+This can also be written as a list comprehension::
+
+    >>> [x for x in range(10) if is_even(x)]
+    [0, 2, 4, 6, 8]
+
+:func:`filter` also has a counterpart in the :mod:`itertools` module,
+:func:`itertools.ifilter`, that returns an iterator and can therefore handle
+infinite sequences just as :func:`itertools.imap` can.
+
+``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
+:mod:`itertools` module because it cumulatively performs an operation on all the
+iterable's elements and therefore can't be applied to infinite iterables.
+``func`` must be a function that takes two elements and returns a single value.
+:func:`reduce` takes the first two elements A and B returned by the iterator and
+calculates ``func(A, B)``.  It then requests the third element, C, calculates
+``func(func(A, B), C)``, combines this result with the fourth element returned,
+and continues until the iterable is exhausted.  If the iterable returns no
+values at all, a :exc:`TypeError` exception is raised.  If the initial value is
+supplied, it's used as a starting point and ``func(initial_value, A)`` is the
+first calculation.
+
+::
+
+    import operator
+    reduce(operator.concat, ['A', 'BB', 'C']) =>
+      'ABBC'
+    reduce(operator.concat, []) =>
+      TypeError: reduce() of empty sequence with no initial value
+    reduce(operator.mul, [1,2,3], 1) =>
+      6
+    reduce(operator.mul, [], 1) =>
+      1
+
+If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
+elements of the iterable.  This case is so common that there's a special
+built-in called :func:`sum` to compute it::
+
+    reduce(operator.add, [1,2,3,4], 0) =>
+      10
+    sum([1,2,3,4]) =>
+      10
+    sum([]) =>
+      0
+
+For many uses of :func:`reduce`, though, it can be clearer to just write the
+obvious :keyword:`for` loop::
+
+    # Instead of:
+    product = reduce(operator.mul, [1,2,3], 1)
+
+    # You can write:
+    product = 1
+    for i in [1,2,3]:
+        product *= i
+
+
+``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
+containing the count and each element.
+
+::
+
+    enumerate(['subject', 'verb', 'object']) =>
+      (0, 'subject'), (1, 'verb'), (2, 'object')
+
+:func:`enumerate` is often used when looping through a list and recording the
+indexes at which certain conditions are met::
+
+    f = open('data.txt', 'r')
+    for i, line in enumerate(f):
+        if line.strip() == '':
+            print 'Blank line at line #%i' % i
+
+``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the
+elements of the iterable into a list, sorts the list, and returns the sorted
+result.  The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
+the constructed list's ``.sort()`` method.
+
+::
+
+    import random
+    # Generate 8 random numbers between [0, 10000)
+    rand_list = random.sample(range(10000), 8)
+    rand_list =>
+      [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
+    sorted(rand_list) =>
+      [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
+    sorted(rand_list, reverse=True) =>
+      [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
+
+(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the
+Python wiki at http://wiki.python.org/moin/HowTo/Sorting.)
+
+The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an
+iterable's contents.  :func:`any` returns True if any element in the iterable is
+a true value, and :func:`all` returns True if all of the elements are true
+values::
+
+    any([0,1,0]) =>
+      True
+    any([0,0,0]) =>
+      False
+    any([1,1,1]) =>
+      True
+    all([0,1,0]) =>
+      False
+    all([0,0,0]) => 
+      False
+    all([1,1,1]) =>
+      True
+
+
+Small functions and the lambda expression
+=========================================
+
+When writing functional-style programs, you'll often need little functions that
+act as predicates or that combine elements in some way.
+
+If there's a Python built-in or a module function that's suitable, you don't
+need to define a new function at all::
+
+        stripped_lines = [line.strip() for line in lines]
+        existing_files = filter(os.path.exists, file_list)
+
+If the function you need doesn't exist, you need to write it.  One way to write
+small functions is to use the ``lambda`` statement.  ``lambda`` takes a number
+of parameters and an expression combining these parameters, and creates a small
+function that returns the value of the expression::
+
+        lowercase = lambda x: x.lower()
+
+        print_assign = lambda name, value: name + '=' + str(value)
+
+        adder = lambda x, y: x+y
+
+An alternative is to just use the ``def`` statement and define a function in the
+usual way::
+
+        def lowercase(x):
+            return x.lower()
+
+        def print_assign(name, value):
+            return name + '=' + str(value)
+
+        def adder(x,y):
+            return x + y
+
+Which alternative is preferable?  That's a style question; my usual course is to
+avoid using ``lambda``.
+
+One reason for my preference is that ``lambda`` is quite limited in the
+functions it can define.  The result has to be computable as a single
+expression, which means you can't have multiway ``if... elif... else``
+comparisons or ``try... except`` statements.  If you try to do too much in a
+``lambda`` statement, you'll end up with an overly complicated expression that's
+hard to read.  Quick, what's the following code doing?
+
+::
+
+    total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
+
+You can figure it out, but it takes time to disentangle the expression to figure
+out what's going on.  Using a short nested ``def`` statements makes things a
+little bit better::
+
+    def combine (a, b):
+        return 0, a[1] + b[1]
+
+    total = reduce(combine, items)[1]
+
+But it would be best of all if I had simply used a ``for`` loop::
+
+     total = 0
+     for a, b in items:
+         total += b
+
+Or the :func:`sum` built-in and a generator expression::
+
+     total = sum(b for a,b in items)
+
+Many uses of :func:`reduce` are clearer when written as ``for`` loops.
+
+Fredrik Lundh once suggested the following set of rules for refactoring uses of
+``lambda``:
+
+1) Write a lambda function.
+2) Write a comment explaining what the heck that lambda does.
+3) Study the comment for a while, and think of a name that captures the essence
+   of the comment.
+4) Convert the lambda to a def statement, using that name.
+5) Remove the comment.
+
+I really like these rules, but you're free to disagree that this lambda-free
+style is better.
+
+
+The itertools module
+====================
+
+The :mod:`itertools` module contains a number of commonly-used iterators as well
+as functions for combining several iterators.  This section will introduce the
+module's contents by showing small examples.
+
+The module's functions fall into a few broad classes:
+
+* Functions that create a new iterator based on an existing iterator.
+* Functions for treating an iterator's elements as function arguments.
+* Functions for selecting portions of an iterator's output.
+* A function for grouping an iterator's output.
+
+Creating new iterators
+----------------------
+
+``itertools.count(n)`` returns an infinite stream of integers, increasing by 1
+each time.  You can optionally supply the starting number, which defaults to 0::
+
+        itertools.count() =>
+          0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
+        itertools.count(10) =>
+          10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
+
+``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable
+and returns a new iterator that returns its elements from first to last.  The
+new iterator will repeat these elements infinitely.
+
+::
+
+        itertools.cycle([1,2,3,4,5]) =>
+          1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
+
+``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or
+returns the element endlessly if ``n`` is not provided.
+
+::
+
+    itertools.repeat('abc') =>
+      abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
+    itertools.repeat('abc', 5) =>
+      abc, abc, abc, abc, abc
+
+``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as
+input, and returns all the elements of the first iterator, then all the elements
+of the second, and so on, until all of the iterables have been exhausted.
+
+::
+
+    itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
+      a, b, c, 1, 2, 3
+
+``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and
+returns them in a tuple::
+
+    itertools.izip(['a', 'b', 'c'], (1, 2, 3)) =>
+      ('a', 1), ('b', 2), ('c', 3)
+
+It's similiar to the built-in :func:`zip` function, but doesn't construct an
+in-memory list and exhaust all the input iterators before returning; instead
+tuples are constructed and returned only if they're requested.  (The technical
+term for this behaviour is `lazy evaluation
+<http://en.wikipedia.org/wiki/Lazy_evaluation>`__.)
+
+This iterator is intended to be used with iterables that are all of the same
+length.  If the iterables are of different lengths, the resulting stream will be
+the same length as the shortest iterable.
+
+::
+
+    itertools.izip(['a', 'b'], (1, 2, 3)) =>
+      ('a', 1), ('b', 2)
+
+You should avoid doing this, though, because an element may be taken from the
+longer iterators and discarded.  This means you can't go on to use the iterators
+further because you risk skipping a discarded element.
+
+``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a
+slice of the iterator.  With a single ``stop`` argument, it will return the
+first ``stop`` elements.  If you supply a starting index, you'll get
+``stop-start`` elements, and if you supply a value for ``step``, elements will
+be skipped accordingly.  Unlike Python's string and list slicing, you can't use
+negative values for ``start``, ``stop``, or ``step``.
+
+::
+
+    itertools.islice(range(10), 8) =>
+      0, 1, 2, 3, 4, 5, 6, 7
+    itertools.islice(range(10), 2, 8) =>
+      2, 3, 4, 5, 6, 7
+    itertools.islice(range(10), 2, 8, 2) =>
+      2, 4, 6
+
+``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
+independent iterators that will all return the contents of the source iterator.
+If you don't supply a value for ``n``, the default is 2.  Replicating iterators
+requires saving some of the contents of the source iterator, so this can consume
+significant memory if the iterator is large and one of the new iterators is
+consumed more than the others.
+
+::
+
+        itertools.tee( itertools.count() ) =>
+           iterA, iterB
+
+        where iterA ->
+           0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
+
+        and   iterB ->
+           0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
+
+
+Calling functions on elements
+-----------------------------
+
+Two functions are used for calling other functions on the contents of an
+iterable.
+
+``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
+``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
+
+    itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
+      6, 8, 8
+
+The ``operator`` module contains a set of functions corresponding to Python's
+operators.  Some examples are ``operator.add(a, b)`` (adds two values),
+``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')``
+(returns a callable that fetches the ``"id"`` attribute).
+
+``itertools.starmap(func, iter)`` assumes that the iterable will return a stream
+of tuples, and calls ``f()`` using these tuples as the arguments::
+
+    itertools.starmap(os.path.join, 
+                      [('/usr', 'bin', 'java'), ('/bin', 'python'),
+                       ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
+    =>
+      /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
+
+
+Selecting elements
+------------------
+
+Another group of functions chooses a subset of an iterator's elements based on a
+predicate.
+
+``itertools.ifilter(predicate, iter)`` returns all the elements for which the
+predicate returns true::
+
+    def is_even(x):
+        return (x % 2) == 0
+
+    itertools.ifilter(is_even, itertools.count()) =>
+      0, 2, 4, 6, 8, 10, 12, 14, ...
+
+``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
+elements for which the predicate returns false::
+
+    itertools.ifilterfalse(is_even, itertools.count()) =>
+      1, 3, 5, 7, 9, 11, 13, 15, ...
+
+``itertools.takewhile(predicate, iter)`` returns elements for as long as the
+predicate returns true.  Once the predicate returns false, the iterator will
+signal the end of its results.
+
+::
+
+    def less_than_10(x):
+        return (x < 10)
+
+    itertools.takewhile(less_than_10, itertools.count()) =>
+      0, 1, 2, 3, 4, 5, 6, 7, 8, 9
+
+    itertools.takewhile(is_even, itertools.count()) =>
+      0
+
+``itertools.dropwhile(predicate, iter)`` discards elements while the predicate
+returns true, and then returns the rest of the iterable's results.
+
+::
+
+    itertools.dropwhile(less_than_10, itertools.count()) =>
+      10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
+
+    itertools.dropwhile(is_even, itertools.count()) =>
+      1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
+
+
+Grouping elements
+'''''''''''''''''
+
+The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is
+the most complicated.  ``key_func(elem)`` is a function that can compute a key
+value for each element returned by the iterable.  If you don't supply a key
+function, the key is simply each element itself.
+
+``groupby()`` collects all the consecutive elements from the underlying iterable
+that have the same key value, and returns a stream of 2-tuples containing a key
+value and an iterator for the elements with that key.
+
+::
+
+    city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'), 
+                 ('Anchorage', 'AK'), ('Nome', 'AK'),
+                 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'), 
+                 ...
+                ]
+
+    def get_state ((city, state)):
+        return state
+
+    itertools.groupby(city_list, get_state) =>
+      ('AL', iterator-1),
+      ('AK', iterator-2),
+      ('AZ', iterator-3), ...
+
+    where
+    iterator-1 =>
+      ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
+    iterator-2 => 
+      ('Anchorage', 'AK'), ('Nome', 'AK')
+    iterator-3 =>
+      ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
+
+``groupby()`` assumes that the underlying iterable's contents will already be
+sorted based on the key.  Note that the returned iterators also use the
+underlying iterable, so you have to consume the results of iterator-1 before
+requesting iterator-2 and its corresponding key.
+
+
+The functools module
+====================
+
+The :mod:`functools` module in Python 2.5 contains some higher-order functions.
+A **higher-order function** takes one or more functions as input and returns a
+new function.  The most useful tool in this module is the
+:func:`functools.partial` function.
+
+For programs written in a functional style, you'll sometimes want to construct
+variants of existing functions that have some of the parameters filled in.
+Consider a Python function ``f(a, b, c)``; you may wish to create a new function
+``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for
+one of ``f()``'s parameters.  This is called "partial function application".
+
+The constructor for ``partial`` takes the arguments ``(function, arg1, arg2,
+... kwarg1=value1, kwarg2=value2)``.  The resulting object is callable, so you
+can just call it to invoke ``function`` with the filled-in arguments.
+
+Here's a small but realistic example::
+
+    import functools
+
+    def log (message, subsystem):
+        "Write the contents of 'message' to the specified subsystem."
+        print '%s: %s' % (subsystem, message)
+        ...
+
+    server_log = functools.partial(log, subsystem='server')
+    server_log('Unable to open socket')
+
+
+The operator module
+-------------------
+
+The :mod:`operator` module was mentioned earlier.  It contains a set of
+functions corresponding to Python's operators.  These functions are often useful
+in functional-style code because they save you from writing trivial functions
+that perform a single operation.
+
+Some of the functions in this module are:
+
+* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
+  ``abs()``, ...
+* Logical operations: ``not_()``, ``truth()``.
+* Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
+* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
+* Object identity: ``is_()``, ``is_not()``.
+
+Consult the operator module's documentation for a complete list.
+
+
+
+The functional module
+---------------------
+
+Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__
+provides a number of more advanced tools for functional programming. It also
+reimplements several Python built-ins, trying to make them more intuitive to
+those used to functional programming in other languages.
+
+This section contains an introduction to some of the most important functions in
+``functional``; full documentation can be found at `the project's website
+<http://oakwinter.com/code/functional/documentation/>`__.
+
+``compose(outer, inner, unpack=False)``
+
+The ``compose()`` function implements function composition.  In other words, it
+returns a wrapper around the ``outer`` and ``inner`` callables, such that the
+return value from ``inner`` is fed directly to ``outer``.  That is,
+
+::
+
+        >>> def add(a, b):
+        ...     return a + b
+        ...
+        >>> def double(a):
+        ...     return 2 * a
+        ...
+        >>> compose(double, add)(5, 6)
+        22
+
+is equivalent to
+
+::
+
+        >>> double(add(5, 6))
+        22
+                    
+The ``unpack`` keyword is provided to work around the fact that Python functions
+are not always `fully curried <http://en.wikipedia.org/wiki/Currying>`__.  By
+default, it is expected that the ``inner`` function will return a single object
+and that the ``outer`` function will take a single argument. Setting the
+``unpack`` argument causes ``compose`` to expect a tuple from ``inner`` which
+will be expanded before being passed to ``outer``. Put simply,
+
+::
+
+        compose(f, g)(5, 6)
+                    
+is equivalent to::
+
+        f(g(5, 6))
+                    
+while
+
+::
+
+        compose(f, g, unpack=True)(5, 6)
+                    
+is equivalent to::
+
+        f(*g(5, 6))
+
+Even though ``compose()`` only accepts two functions, it's trivial to build up a
+version that will compose any number of functions. We'll use ``reduce()``,
+``compose()`` and ``partial()`` (the last of which is provided by both
+``functional`` and ``functools``).
+
+::
+
+        from functional import compose, partial
+        
+        multi_compose = partial(reduce, compose)
+        
+    
+We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of
+``"".join(...)`` that converts its arguments to string::
+
+        from functional import compose, partial
+        
+        join = compose("".join, partial(map, str))
+
+
+``flip(func)``
+                    
+``flip()`` wraps the callable in ``func`` and causes it to receive its
+non-keyword arguments in reverse order.
+
+::
+
+        >>> def triple(a, b, c):
+        ...     return (a, b, c)
+        ...
+        >>> triple(5, 6, 7)
+        (5, 6, 7)
+        >>>
+        >>> flipped_triple = flip(triple)
+        >>> flipped_triple(5, 6, 7)
+        (7, 6, 5)
+
+``foldl(func, start, iterable)``
+                    
+``foldl()`` takes a binary function, a starting value (usually some kind of
+'zero'), and an iterable.  The function is applied to the starting value and the
+first element of the list, then the result of that and the second element of the
+list, then the result of that and the third element of the list, and so on.
+
+This means that a call such as::
+
+        foldl(f, 0, [1, 2, 3])
+
+is equivalent to::
+
+        f(f(f(0, 1), 2), 3)
+
+    
+``foldl()`` is roughly equivalent to the following recursive function::
+
+        def foldl(func, start, seq):
+            if len(seq) == 0:
+                return start
+
+            return foldl(func, func(start, seq[0]), seq[1:])
+
+Speaking of equivalence, the above ``foldl`` call can be expressed in terms of
+the built-in ``reduce`` like so::
+
+        reduce(f, [1, 2, 3], 0)
+
+
+We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to write a
+cleaner, more aesthetically-pleasing version of Python's ``"".join(...)``
+idiom::
+
+        from functional import foldl, partial
+        from operator import concat
+        
+        join = partial(foldl, concat, "")
+
+
+Revision History and Acknowledgements
+=====================================
+
+The author would like to thank the following people for offering suggestions,
+corrections and assistance with various drafts of this article: Ian Bicking,
+Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro
+Lameiro, Jussi Salmela, Collin Winter, Blake Winton.
+
+Version 0.1: posted June 30 2006.
+
+Version 0.11: posted July 1 2006.  Typo fixes.
+
+Version 0.2: posted July 10 2006.  Merged genexp and listcomp sections into one.
+Typo fixes.
+
+Version 0.21: Added more references suggested on the tutor mailing list.
+
+Version 0.30: Adds a section on the ``functional`` module written by Collin
+Winter; adds short section on the operator module; a few other edits.
+
+
+References
+==========
+
+General
+-------
+
+**Structure and Interpretation of Computer Programs**, by Harold Abelson and
+Gerald Jay Sussman with Julie Sussman.  Full text at
+http://mitpress.mit.edu/sicp/.  In this classic textbook of computer science,
+chapters 2 and 3 discuss the use of sequences and streams to organize the data
+flow inside a program.  The book uses Scheme for its examples, but many of the
+design approaches described in these chapters are applicable to functional-style
+Python code.
+
+http://www.defmacro.org/ramblings/fp.html: A general introduction to functional
+programming that uses Java examples and has a lengthy historical introduction.
+
+http://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry
+describing functional programming.
+
+http://en.wikipedia.org/wiki/Coroutine: Entry for coroutines.
+
+http://en.wikipedia.org/wiki/Currying: Entry for the concept of currying.
+
+Python-specific
+---------------
+
+http://gnosis.cx/TPiP/: The first chapter of David Mertz's book
+:title-reference:`Text Processing in Python` discusses functional programming
+for text processing, in the section titled "Utilizing Higher-Order Functions in
+Text Processing".
+
+Mertz also wrote a 3-part series of articles on functional programming
+for IBM's DeveloperWorks site; see 
+`part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__,
+`part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and
+`part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__,
+
+
+Python documentation
+--------------------
+
+Documentation for the :mod:`itertools` module.
+
+Documentation for the :mod:`operator` module.
+
+:pep:`289`: "Generator Expressions"
+
+:pep:`342`: "Coroutines via Enhanced Generators" describes the new generator
+features in Python 2.5.
+
+.. comment
+
+    Topics to place
+    -----------------------------
+
+    XXX os.walk()
+
+    XXX Need a large example.
+
+    But will an example add much?  I'll post a first draft and see
+    what the comments say.
+
+.. comment
+
+    Original outline:
+    Introduction
+            Idea of FP
+                    Programs built out of functions
+                    Functions are strictly input-output, no internal state
+            Opposed to OO programming, where objects have state
+
+            Why FP?
+                    Formal provability
+                            Assignment is difficult to reason about
+                            Not very relevant to Python
+                    Modularity
+                            Small functions that do one thing
+                    Debuggability:
+                            Easy to test due to lack of state
+                            Easy to verify output from intermediate steps
+                    Composability
+                            You assemble a toolbox of functions that can be mixed
+
+    Tackling a problem
+            Need a significant example
+
+    Iterators
+    Generators
+    The itertools module
+    List comprehensions
+    Small functions and the lambda statement
+    Built-in functions
+            map
+            filter
+            reduce
+
+.. comment
+
+    Handy little function for printing part of an iterator -- used
+    while writing this document.
+
+    import itertools
+    def print_iter(it):
+         slice = itertools.islice(it, 10)
+         for elem in slice[:-1]:
+             sys.stdout.write(str(elem))
+             sys.stdout.write(', ')
+        print elem[-1]
+
+

Added: doctools/trunk/Doc-26/howto/index.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/index.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,24 @@
+***************
+ Python HOWTOs
+***************
+
+Python HOWTOs are documents that cover a single, specific topic,
+and attempt to cover it fairly completely. Modelled on the Linux
+Documentation Project's HOWTO collection, this collection is an
+effort to foster documentation that's more detailed than the
+Python Library Reference.
+
+Currently, the HOWTOs are:
+
+.. toctree::
+   :maxdepth: 1
+
+   advocacy.rst
+   curses.rst
+   doanddont.rst
+   functional.rst
+   regex.rst
+   sockets.rst
+   unicode.rst
+   urllib2.rst
+

Added: doctools/trunk/Doc-26/howto/regex.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/regex.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,1377 @@
+****************************
+  Regular Expression HOWTO  
+****************************
+
+:Author: A.M. Kuchling
+:Release: 0.05
+
+.. % TODO:
+.. % Document lookbehind assertions
+.. % Better way of displaying a RE, a string, and what it matches
+.. % Mention optional argument to match.groups()
+.. % Unicode (at least a reference)
+
+
+.. topic:: Abstract
+
+   This document is an introductory tutorial to using regular expressions in Python
+   with the :mod:`re` module.  It provides a gentler introduction than the
+   corresponding section in the Library Reference.
+
+
+Introduction
+============
+
+The :mod:`re` module was added in Python 1.5, and provides Perl-style regular
+expression patterns.  Earlier versions of Python came with the :mod:`regex`
+module, which provided Emacs-style patterns.  The :mod:`regex` module was
+removed completely in Python 2.5.
+
+Regular expressions (called REs, or regexes, or regex patterns) are essentially
+a tiny, highly specialized programming language embedded inside Python and made
+available through the :mod:`re` module. Using this little language, you specify
+the rules for the set of possible strings that you want to match; this set might
+contain English sentences, or e-mail addresses, or TeX commands, or anything you
+like.  You can then ask questions such as "Does this string match the pattern?",
+or "Is there a match for the pattern anywhere in this string?".  You can also
+use REs to modify a string or to split it apart in various ways.
+
+Regular expression patterns are compiled into a series of bytecodes which are
+then executed by a matching engine written in C.  For advanced use, it may be
+necessary to pay careful attention to how the engine will execute a given RE,
+and write the RE in a certain way in order to produce bytecode that runs faster.
+Optimization isn't covered in this document, because it requires that you have a
+good understanding of the matching engine's internals.
+
+The regular expression language is relatively small and restricted, so not all
+possible string processing tasks can be done using regular expressions.  There
+are also tasks that *can* be done with regular expressions, but the expressions
+turn out to be very complicated.  In these cases, you may be better off writing
+Python code to do the processing; while Python code will be slower than an
+elaborate regular expression, it will also probably be more understandable.
+
+
+Simple Patterns
+===============
+
+We'll start by learning about the simplest possible regular expressions.  Since
+regular expressions are used to operate on strings, we'll begin with the most
+common task: matching characters.
+
+For a detailed explanation of the computer science underlying regular
+expressions (deterministic and non-deterministic finite automata), you can refer
+to almost any textbook on writing compilers.
+
+
+Matching Characters
+-------------------
+
+Most letters and characters will simply match themselves.  For example, the
+regular expression ``test`` will match the string ``test`` exactly.  (You can
+enable a case-insensitive mode that would let this RE match ``Test`` or ``TEST``
+as well; more about this later.)
+
+There are exceptions to this rule; some characters are special
+:dfn:`metacharacters`, and don't match themselves.  Instead, they signal that
+some out-of-the-ordinary thing should be matched, or they affect other portions
+of the RE by repeating them or changing their meaning.  Much of this document is
+devoted to discussing various metacharacters and what they do.
+
+Here's a complete list of the metacharacters; their meanings will be discussed
+in the rest of this HOWTO. ::
+
+   . ^ $ * + ? { [ ] \ | ( )
+
+The first metacharacters we'll look at are ``[`` and ``]``. They're used for
+specifying a character class, which is a set of characters that you wish to
+match.  Characters can be listed individually, or a range of characters can be
+indicated by giving two characters and separating them by a ``'-'``.  For
+example, ``[abc]`` will match any of the characters ``a``, ``b``, or ``c``; this
+is the same as ``[a-c]``, which uses a range to express the same set of
+characters.  If you wanted to match only lowercase letters, your RE would be
+``[a-z]``.
+
+.. % $
+
+Metacharacters are not active inside classes.  For example, ``[akm$]`` will
+match any of the characters ``'a'``, ``'k'``, ``'m'``, or ``'$'``; ``'$'`` is
+usually a metacharacter, but inside a character class it's stripped of its
+special nature.
+
+You can match the characters not listed within the class by :dfn:`complementing`
+the set.  This is indicated by including a ``'^'`` as the first character of the
+class; ``'^'`` outside a character class will simply match the ``'^'``
+character.  For example, ``[^5]`` will match any character except ``'5'``.
+
+Perhaps the most important metacharacter is the backslash, ``\``.   As in Python
+string literals, the backslash can be followed by various characters to signal
+various special sequences.  It's also used to escape all the metacharacters so
+you can still match them in patterns; for example, if you need to match a ``[``
+or  ``\``, you can precede them with a backslash to remove their special
+meaning: ``\[`` or ``\\``.
+
+Some of the special sequences beginning with ``'\'`` represent predefined sets
+of characters that are often useful, such as the set of digits, the set of
+letters, or the set of anything that isn't whitespace.  The following predefined
+special sequences are available:
+
+``\d``
+   Matches any decimal digit; this is equivalent to the class ``[0-9]``.
+
+``\D``
+   Matches any non-digit character; this is equivalent to the class ``[^0-9]``.
+
+``\s``
+   Matches any whitespace character; this is equivalent to the class ``[
+   \t\n\r\f\v]``.
+
+``\S``
+   Matches any non-whitespace character; this is equivalent to the class ``[^
+   \t\n\r\f\v]``.
+
+``\w``
+   Matches any alphanumeric character; this is equivalent to the class
+   ``[a-zA-Z0-9_]``.
+
+``\W``
+   Matches any non-alphanumeric character; this is equivalent to the class
+   ``[^a-zA-Z0-9_]``.
+
+These sequences can be included inside a character class.  For example,
+``[\s,.]`` is a character class that will match any whitespace character, or
+``','`` or ``'.'``.
+
+The final metacharacter in this section is ``.``.  It matches anything except a
+newline character, and there's an alternate mode (``re.DOTALL``) where it will
+match even a newline.  ``'.'`` is often used where you want to match "any
+character".
+
+
+Repeating Things
+----------------
+
+Being able to match varying sets of characters is the first thing regular
+expressions can do that isn't already possible with the methods available on
+strings.  However, if that was the only additional capability of regexes, they
+wouldn't be much of an advance. Another capability is that you can specify that
+portions of the RE must be repeated a certain number of times.
+
+The first metacharacter for repeating things that we'll look at is ``*``.  ``*``
+doesn't match the literal character ``*``; instead, it specifies that the
+previous character can be matched zero or more times, instead of exactly once.
+
+For example, ``ca*t`` will match ``ct`` (0 ``a`` characters), ``cat`` (1 ``a``),
+``caaat`` (3 ``a`` characters), and so forth.  The RE engine has various
+internal limitations stemming from the size of C's ``int`` type that will
+prevent it from matching over 2 billion ``a`` characters; you probably don't
+have enough memory to construct a string that large, so you shouldn't run into
+that limit.
+
+Repetitions such as ``*`` are :dfn:`greedy`; when repeating a RE, the matching
+engine will try to repeat it as many times as possible. If later portions of the
+pattern don't match, the matching engine will then back up and try again with
+few repetitions.
+
+A step-by-step example will make this more obvious.  Let's consider the
+expression ``a[bcd]*b``.  This matches the letter ``'a'``, zero or more letters
+from the class ``[bcd]``, and finally ends with a ``'b'``.  Now imagine matching
+this RE against the string ``abcbd``.
+
++------+-----------+---------------------------------+
+| Step | Matched   | Explanation                     |
++======+===========+=================================+
+| 1    | ``a``     | The ``a`` in the RE matches.    |
++------+-----------+---------------------------------+
+| 2    | ``abcbd`` | The engine matches ``[bcd]*``,  |
+|      |           | going as far as it can, which   |
+|      |           | is to the end of the string.    |
++------+-----------+---------------------------------+
+| 3    | *Failure* | The engine tries to match       |
+|      |           | ``b``, but the current position |
+|      |           | is at the end of the string, so |
+|      |           | it fails.                       |
++------+-----------+---------------------------------+
+| 4    | ``abcb``  | Back up, so that  ``[bcd]*``    |
+|      |           | matches one less character.     |
++------+-----------+---------------------------------+
+| 5    | *Failure* | Try ``b`` again, but the        |
+|      |           | current position is at the last |
+|      |           | character, which is a ``'d'``.  |
++------+-----------+---------------------------------+
+| 6    | ``abc``   | Back up again, so that          |
+|      |           | ``[bcd]*`` is only matching     |
+|      |           | ``bc``.                         |
++------+-----------+---------------------------------+
+| 6    | ``abcb``  | Try ``b`` again.  This time     |
+|      |           | but the character at the        |
+|      |           | current position is ``'b'``, so |
+|      |           | it succeeds.                    |
++------+-----------+---------------------------------+
+
+The end of the RE has now been reached, and it has matched ``abcb``.  This
+demonstrates how the matching engine goes as far as it can at first, and if no
+match is found it will then progressively back up and retry the rest of the RE
+again and again.  It will back up until it has tried zero matches for
+``[bcd]*``, and if that subsequently fails, the engine will conclude that the
+string doesn't match the RE at all.
+
+Another repeating metacharacter is ``+``, which matches one or more times.  Pay
+careful attention to the difference between ``*`` and ``+``; ``*`` matches
+*zero* or more times, so whatever's being repeated may not be present at all,
+while ``+`` requires at least *one* occurrence.  To use a similar example,
+``ca+t`` will match ``cat`` (1 ``a``), ``caaat`` (3 ``a``'s), but won't match
+``ct``.
+
+There are two more repeating qualifiers.  The question mark character, ``?``,
+matches either once or zero times; you can think of it as marking something as
+being optional.  For example, ``home-?brew`` matches either ``homebrew`` or
+``home-brew``.
+
+The most complicated repeated qualifier is ``{m,n}``, where *m* and *n* are
+decimal integers.  This qualifier means there must be at least *m* repetitions,
+and at most *n*.  For example, ``a/{1,3}b`` will match ``a/b``, ``a//b``, and
+``a///b``.  It won't match ``ab``, which has no slashes, or ``a////b``, which
+has four.
+
+You can omit either *m* or *n*; in that case, a reasonable value is assumed for
+the missing value.  Omitting *m* is interpreted as a lower limit of 0, while
+omitting *n* results in an upper bound of infinity --- actually, the upper bound
+is the 2-billion limit mentioned earlier, but that might as well be infinity.
+
+Readers of a reductionist bent may notice that the three other qualifiers can
+all be expressed using this notation.  ``{0,}`` is the same as ``*``, ``{1,}``
+is equivalent to ``+``, and ``{0,1}`` is the same as ``?``.  It's better to use
+``*``, ``+``, or ``?`` when you can, simply because they're shorter and easier
+to read.
+
+
+Using Regular Expressions
+=========================
+
+Now that we've looked at some simple regular expressions, how do we actually use
+them in Python?  The :mod:`re` module provides an interface to the regular
+expression engine, allowing you to compile REs into objects and then perform
+matches with them.
+
+
+Compiling Regular Expressions
+-----------------------------
+
+Regular expressions are compiled into :class:`RegexObject` instances, which have
+methods for various operations such as searching for pattern matches or
+performing string substitutions. ::
+
+   >>> import re
+   >>> p = re.compile('ab*')
+   >>> print p
+   <re.RegexObject instance at 80b4150>
+
+:func:`re.compile` also accepts an optional *flags* argument, used to enable
+various special features and syntax variations.  We'll go over the available
+settings later, but for now a single example will do::
+
+   >>> p = re.compile('ab*', re.IGNORECASE)
+
+The RE is passed to :func:`re.compile` as a string.  REs are handled as strings
+because regular expressions aren't part of the core Python language, and no
+special syntax was created for expressing them.  (There are applications that
+don't need REs at all, so there's no need to bloat the language specification by
+including them.) Instead, the :mod:`re` module is simply a C extension module
+included with Python, just like the :mod:`socket` or :mod:`zlib` modules.
+
+Putting REs in strings keeps the Python language simpler, but has one
+disadvantage which is the topic of the next section.
+
+
+The Backslash Plague
+--------------------
+
+As stated earlier, regular expressions use the backslash character (``'\'``) to
+indicate special forms or to allow special characters to be used without
+invoking their special meaning. This conflicts with Python's usage of the same
+character for the same purpose in string literals.
+
+Let's say you want to write a RE that matches the string ``\section``, which
+might be found in a LaTeX file.  To figure out what to write in the program
+code, start with the desired string to be matched.  Next, you must escape any
+backslashes and other metacharacters by preceding them with a backslash,
+resulting in the string ``\\section``.  The resulting string that must be passed
+to :func:`re.compile` must be ``\\section``.  However, to express this as a
+Python string literal, both backslashes must be escaped *again*.
+
++-------------------+------------------------------------------+
+| Characters        | Stage                                    |
++===================+==========================================+
+| ``\section``      | Text string to be matched                |
++-------------------+------------------------------------------+
+| ``\\section``     | Escaped backslash for :func:`re.compile` |
++-------------------+------------------------------------------+
+| ``"\\\\section"`` | Escaped backslashes for a string literal |
++-------------------+------------------------------------------+
+
+In short, to match a literal backslash, one has to write ``'\\\\'`` as the RE
+string, because the regular expression must be ``\\``, and each backslash must
+be expressed as ``\\`` inside a regular Python string literal.  In REs that
+feature backslashes repeatedly, this leads to lots of repeated backslashes and
+makes the resulting strings difficult to understand.
+
+The solution is to use Python's raw string notation for regular expressions;
+backslashes are not handled in any special way in a string literal prefixed with
+``'r'``, so ``r"\n"`` is a two-character string containing ``'\'`` and ``'n'``,
+while ``"\n"`` is a one-character string containing a newline. Regular
+expressions will often be written in Python code using this raw string notation.
+
++-------------------+------------------+
+| Regular String    | Raw string       |
++===================+==================+
+| ``"ab*"``         | ``r"ab*"``       |
++-------------------+------------------+
+| ``"\\\\section"`` | ``r"\\section"`` |
++-------------------+------------------+
+| ``"\\w+\\s+\\1"`` | ``r"\w+\s+\1"``  |
++-------------------+------------------+
+
+
+Performing Matches
+------------------
+
+Once you have an object representing a compiled regular expression, what do you
+do with it?  :class:`RegexObject` instances have several methods and attributes.
+Only the most significant ones will be covered here; consult `the Library
+Reference <http://www.python.org/doc/lib/module-re.html>`_ for a complete
+listing.
+
++------------------+-----------------------------------------------+
+| Method/Attribute | Purpose                                       |
++==================+===============================================+
+| ``match()``      | Determine if the RE matches at the beginning  |
+|                  | of the string.                                |
++------------------+-----------------------------------------------+
+| ``search()``     | Scan through a string, looking for any        |
+|                  | location where this RE matches.               |
++------------------+-----------------------------------------------+
+| ``findall()``    | Find all substrings where the RE matches, and |
+|                  | returns them as a list.                       |
++------------------+-----------------------------------------------+
+| ``finditer()``   | Find all substrings where the RE matches, and |
+|                  | returns them as an iterator.                  |
++------------------+-----------------------------------------------+
+
+:meth:`match` and :meth:`search` return ``None`` if no match can be found.  If
+they're successful, a ``MatchObject`` instance is returned, containing
+information about the match: where it starts and ends, the substring it matched,
+and more.
+
+You can learn about this by interactively experimenting with the :mod:`re`
+module.  If you have Tkinter available, you may also want to look at
+:file:`Tools/scripts/redemo.py`, a demonstration program included with the
+Python distribution.  It allows you to enter REs and strings, and displays
+whether the RE matches or fails. :file:`redemo.py` can be quite useful when
+trying to debug a complicated RE.  Phil Schwartz's `Kodos <http://www.phil-
+schwartz.com/kodos.spy>`_ is also an interactive tool for developing and testing
+RE patterns.
+
+This HOWTO uses the standard Python interpreter for its examples. First, run the
+Python interpreter, import the :mod:`re` module, and compile a RE::
+
+   Python 2.2.2 (#1, Feb 10 2003, 12:57:01)
+   >>> import re
+   >>> p = re.compile('[a-z]+')
+   >>> p
+   <_sre.SRE_Pattern object at 80c3c28>
+
+Now, you can try matching various strings against the RE ``[a-z]+``.  An empty
+string shouldn't match at all, since ``+`` means 'one or more repetitions'.
+:meth:`match` should return ``None`` in this case, which will cause the
+interpreter to print no output.  You can explicitly print the result of
+:meth:`match` to make this clear. ::
+
+   >>> p.match("")
+   >>> print p.match("")
+   None
+
+Now, let's try it on a string that it should match, such as ``tempo``.  In this
+case, :meth:`match` will return a :class:`MatchObject`, so you should store the
+result in a variable for later use. ::
+
+   >>> m = p.match('tempo')
+   >>> print m
+   <_sre.SRE_Match object at 80c4f68>
+
+Now you can query the :class:`MatchObject` for information about the matching
+string.   :class:`MatchObject` instances also have several methods and
+attributes; the most important ones are:
+
++------------------+--------------------------------------------+
+| Method/Attribute | Purpose                                    |
++==================+============================================+
+| ``group()``      | Return the string matched by the RE        |
++------------------+--------------------------------------------+
+| ``start()``      | Return the starting position of the match  |
++------------------+--------------------------------------------+
+| ``end()``        | Return the ending position of the match    |
++------------------+--------------------------------------------+
+| ``span()``       | Return a tuple containing the (start, end) |
+|                  | positions  of the match                    |
++------------------+--------------------------------------------+
+
+Trying these methods will soon clarify their meaning::
+
+   >>> m.group()
+   'tempo'
+   >>> m.start(), m.end()
+   (0, 5)
+   >>> m.span()
+   (0, 5)
+
+:meth:`group` returns the substring that was matched by the RE.  :meth:`start`
+and :meth:`end` return the starting and ending index of the match. :meth:`span`
+returns both start and end indexes in a single tuple.  Since the :meth:`match`
+method only checks if the RE matches at the start of a string, :meth:`start`
+will always be zero.  However, the :meth:`search` method of :class:`RegexObject`
+instances scans through the string, so  the match may not start at zero in that
+case. ::
+
+   >>> print p.match('::: message')
+   None
+   >>> m = p.search('::: message') ; print m
+   <re.MatchObject instance at 80c9650>
+   >>> m.group()
+   'message'
+   >>> m.span()
+   (4, 11)
+
+In actual programs, the most common style is to store the :class:`MatchObject`
+in a variable, and then check if it was ``None``.  This usually looks like::
+
+   p = re.compile( ... )
+   m = p.match( 'string goes here' )
+   if m:
+       print 'Match found: ', m.group()
+   else:
+       print 'No match'
+
+Two :class:`RegexObject` methods return all of the matches for a pattern.
+:meth:`findall` returns a list of matching strings::
+
+   >>> p = re.compile('\d+')
+   >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
+   ['12', '11', '10']
+
+:meth:`findall` has to create the entire list before it can be returned as the
+result.  The :meth:`finditer` method returns a sequence of :class:`MatchObject`
+instances as an iterator. [#]_ ::
+
+   >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
+   >>> iterator
+   <callable-iterator object at 0x401833ac>
+   >>> for match in iterator:
+   ...     print match.span()
+   ...
+   (0, 2)
+   (22, 24)
+   (29, 31)
+
+
+Module-Level Functions
+----------------------
+
+You don't have to create a :class:`RegexObject` and call its methods; the
+:mod:`re` module also provides top-level functions called :func:`match`,
+:func:`search`, :func:`findall`, :func:`sub`, and so forth.  These functions
+take the same arguments as the corresponding :class:`RegexObject` method, with
+the RE string added as the first argument, and still return either ``None`` or a
+:class:`MatchObject` instance. ::
+
+   >>> print re.match(r'From\s+', 'Fromage amk')
+   None
+   >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
+   <re.MatchObject instance at 80c5978>
+
+Under the hood, these functions simply produce a :class:`RegexObject` for you
+and call the appropriate method on it.  They also store the compiled object in a
+cache, so future calls using the same RE are faster.
+
+Should you use these module-level functions, or should you get the
+:class:`RegexObject` and call its methods yourself?  That choice depends on how
+frequently the RE will be used, and on your personal coding style.  If the RE is
+being used at only one point in the code, then the module functions are probably
+more convenient.  If a program contains a lot of regular expressions, or re-uses
+the same ones in several locations, then it might be worthwhile to collect all
+the definitions in one place, in a section of code that compiles all the REs
+ahead of time.  To take an example from the standard library, here's an extract
+from :file:`xmllib.py`::
+
+   ref = re.compile( ... )
+   entityref = re.compile( ... )
+   charref = re.compile( ... )
+   starttagopen = re.compile( ... )
+
+I generally prefer to work with the compiled object, even for one-time uses, but
+few people will be as much of a purist about this as I am.
+
+
+Compilation Flags
+-----------------
+
+Compilation flags let you modify some aspects of how regular expressions work.
+Flags are available in the :mod:`re` module under two names, a long name such as
+:const:`IGNORECASE` and a short, one-letter form such as :const:`I`.  (If you're
+familiar with Perl's pattern modifiers, the one-letter forms use the same
+letters; the short form of :const:`re.VERBOSE` is :const:`re.X`, for example.)
+Multiple flags can be specified by bitwise OR-ing them; ``re.I | re.M`` sets
+both the :const:`I` and :const:`M` flags, for example.
+
+Here's a table of the available flags, followed by a more detailed explanation
+of each one.
+
++---------------------------------+--------------------------------------------+
+| Flag                            | Meaning                                    |
++=================================+============================================+
+| :const:`DOTALL`, :const:`S`     | Make ``.`` match any character, including  |
+|                                 | newlines                                   |
++---------------------------------+--------------------------------------------+
+| :const:`IGNORECASE`, :const:`I` | Do case-insensitive matches                |
++---------------------------------+--------------------------------------------+
+| :const:`LOCALE`, :const:`L`     | Do a locale-aware match                    |
++---------------------------------+--------------------------------------------+
+| :const:`MULTILINE`, :const:`M`  | Multi-line matching, affecting ``^`` and   |
+|                                 | ``$``                                      |
++---------------------------------+--------------------------------------------+
+| :const:`VERBOSE`, :const:`X`    | Enable verbose REs, which can be organized |
+|                                 | more cleanly and understandably.           |
++---------------------------------+--------------------------------------------+
+
+
+.. data:: I
+          IGNORECASE
+   :noindex:
+
+   Perform case-insensitive matching; character class and literal strings will
+   match letters by ignoring case.  For example, ``[A-Z]`` will match lowercase
+   letters, too, and ``Spam`` will match ``Spam``, ``spam``, or ``spAM``. This
+   lowercasing doesn't take the current locale into account; it will if you also
+   set the :const:`LOCALE` flag.
+
+
+.. data:: L
+          LOCALE
+   :noindex:
+
+   Make ``\w``, ``\W``, ``\b``, and ``\B``, dependent on the current locale.
+
+   Locales are a feature of the C library intended to help in writing programs that
+   take account of language differences.  For example, if you're processing French
+   text, you'd want to be able to write ``\w+`` to match words, but ``\w`` only
+   matches the character class ``[A-Za-z]``; it won't match ``'é'`` or ``'ç'``.  If
+   your system is configured properly and a French locale is selected, certain C
+   functions will tell the program that ``'é'`` should also be considered a letter.
+   Setting the :const:`LOCALE` flag when compiling a regular expression will cause
+   the resulting compiled object to use these C functions for ``\w``; this is
+   slower, but also enables ``\w+`` to match French words as you'd expect.
+
+
+.. data:: M
+          MULTILINE
+   :noindex:
+
+   (``^`` and ``$`` haven't been explained yet;  they'll be introduced in section
+   :ref:`more-metacharacters`.)
+
+   Usually ``^`` matches only at the beginning of the string, and ``$`` matches
+   only at the end of the string and immediately before the newline (if any) at the
+   end of the string. When this flag is specified, ``^`` matches at the beginning
+   of the string and at the beginning of each line within the string, immediately
+   following each newline.  Similarly, the ``$`` metacharacter matches either at
+   the end of the string and at the end of each line (immediately preceding each
+   newline).
+
+
+.. data:: S
+          DOTALL
+   :noindex:
+
+   Makes the ``'.'`` special character match any character at all, including a
+   newline; without this flag, ``'.'`` will match anything *except* a newline.
+
+
+.. data:: X
+          VERBOSE
+   :noindex:
+
+   This flag allows you to write regular expressions that are more readable by
+   granting you more flexibility in how you can format them.  When this flag has
+   been specified, whitespace within the RE string is ignored, except when the
+   whitespace is in a character class or preceded by an unescaped backslash; this
+   lets you organize and indent the RE more clearly.  This flag also lets you put
+   comments within a RE that will be ignored by the engine; comments are marked by
+   a ``'#'`` that's neither in a character class or preceded by an unescaped
+   backslash.
+
+   For example, here's a RE that uses :const:`re.VERBOSE`; see how much easier it
+   is to read? ::
+
+      charref = re.compile(r"""
+       &[#]		     # Start of a numeric entity reference
+       (
+           0[0-7]+         # Octal form
+         | [0-9]+          # Decimal form
+         | x[0-9a-fA-F]+   # Hexadecimal form
+       )
+       ;                   # Trailing semicolon
+      """, re.VERBOSE)
+
+   Without the verbose setting, the RE would look like this::
+
+      charref = re.compile("&#(0[0-7]+"
+                           "|[0-9]+"
+                           "|x[0-9a-fA-F]+);")
+
+   In the above example, Python's automatic concatenation of string literals has
+   been used to break up the RE into smaller pieces, but it's still more difficult
+   to understand than the version using :const:`re.VERBOSE`.
+
+
+More Pattern Power
+==================
+
+So far we've only covered a part of the features of regular expressions.  In
+this section, we'll cover some new metacharacters, and how to use groups to
+retrieve portions of the text that was matched.
+
+
+.. _more-metacharacters:
+
+More Metacharacters
+-------------------
+
+There are some metacharacters that we haven't covered yet.  Most of them will be
+covered in this section.
+
+Some of the remaining metacharacters to be discussed are :dfn:`zero-width
+assertions`.  They don't cause the engine to advance through the string;
+instead, they consume no characters at all, and simply succeed or fail.  For
+example, ``\b`` is an assertion that the current position is located at a word
+boundary; the position isn't changed by the ``\b`` at all.  This means that
+zero-width assertions should never be repeated, because if they match once at a
+given location, they can obviously be matched an infinite number of times.
+
+``|``
+   Alternation, or the "or" operator.   If A and B are regular expressions,
+   ``A|B`` will match any string that matches either ``A`` or ``B``. ``|`` has very
+   low precedence in order to make it work reasonably when you're alternating
+   multi-character strings. ``Crow|Servo`` will match either ``Crow`` or ``Servo``,
+   not ``Cro``, a ``'w'`` or an ``'S'``, and ``ervo``.
+
+   To match a literal ``'|'``, use ``\|``, or enclose it inside a character class,
+   as in ``[|]``.
+
+``^``
+   Matches at the beginning of lines.  Unless the :const:`MULTILINE` flag has been
+   set, this will only match at the beginning of the string.  In :const:`MULTILINE`
+   mode, this also matches immediately after each newline within the string.
+
+   For example, if you wish to match the word ``From`` only at the beginning of a
+   line, the RE to use is ``^From``. ::
+
+      >>> print re.search('^From', 'From Here to Eternity')
+      <re.MatchObject instance at 80c1520>
+      >>> print re.search('^From', 'Reciting From Memory')
+      None
+
+   .. % To match a literal \character{\^}, use \regexp{\e\^} or enclose it
+   .. % inside a character class, as in \regexp{[{\e}\^]}.
+
+``$``
+   Matches at the end of a line, which is defined as either the end of the string,
+   or any location followed by a newline character.     ::
+
+      >>> print re.search('}$', '{block}')
+      <re.MatchObject instance at 80adfa8>
+      >>> print re.search('}$', '{block} ')
+      None
+      >>> print re.search('}$', '{block}\n')
+      <re.MatchObject instance at 80adfa8>
+
+   To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
+   as in  ``[$]``.
+
+   .. % $
+
+``\A``
+   Matches only at the start of the string.  When not in :const:`MULTILINE` mode,
+   ``\A`` and ``^`` are effectively the same.  In :const:`MULTILINE` mode, they're
+   different: ``\A`` still matches only at the beginning of the string, but ``^``
+   may match at any location inside the string that follows a newline character.
+
+``\Z``
+   Matches only at the end of the string.
+
+``\b``
+   Word boundary.   This is a zero-width assertion that matches only at the
+   beginning or end of a word.  A word is defined as a sequence of alphanumeric
+   characters, so the end of a word is indicated by whitespace or a non-
+   alphanumeric character.
+
+   The following example matches ``class`` only when it's a complete word; it won't
+   match when it's contained inside another word. ::
+
+      >>> p = re.compile(r'\bclass\b')
+      >>> print p.search('no class at all')
+      <re.MatchObject instance at 80c8f28>
+      >>> print p.search('the declassified algorithm')
+      None
+      >>> print p.search('one subclass is')
+      None
+
+   There are two subtleties you should remember when using this special sequence.
+   First, this is the worst collision between Python's string literals and regular
+   expression sequences.  In Python's string literals, ``\b`` is the backspace
+   character, ASCII value 8.  If you're not using raw strings, then Python will
+   convert the ``\b`` to a backspace, and your RE won't match as you expect it to.
+   The following example looks the same as our previous RE, but omits the ``'r'``
+   in front of the RE string. ::
+
+      >>> p = re.compile('\bclass\b')
+      >>> print p.search('no class at all')
+      None
+      >>> print p.search('\b' + 'class' + '\b')  
+      <re.MatchObject instance at 80c3ee0>
+
+   Second, inside a character class, where there's no use for this assertion,
+   ``\b`` represents the backspace character, for compatibility with Python's
+   string literals.
+
+``\B``
+   Another zero-width assertion, this is the opposite of ``\b``, only matching when
+   the current position is not at a word boundary.
+
+
+Grouping
+--------
+
+Frequently you need to obtain more information than just whether the RE matched
+or not.  Regular expressions are often used to dissect strings by writing a RE
+divided into several subgroups which match different components of interest.
+For example, an RFC-822 header line is divided into a header name and a value,
+separated by a ``':'``, like this::
+
+   From: author at example.com
+   User-Agent: Thunderbird 1.5.0.9 (X11/20061227)
+   MIME-Version: 1.0
+   To: editor at example.com
+
+This can be handled by writing a regular expression which matches an entire
+header line, and has one group which matches the header name, and another group
+which matches the header's value.
+
+Groups are marked by the ``'('``, ``')'`` metacharacters. ``'('`` and ``')'``
+have much the same meaning as they do in mathematical expressions; they group
+together the expressions contained inside them, and you can repeat the contents
+of a group with a repeating qualifier, such as ``*``, ``+``, ``?``, or
+``{m,n}``.  For example, ``(ab)*`` will match zero or more repetitions of
+``ab``. ::
+
+   >>> p = re.compile('(ab)*')
+   >>> print p.match('ababababab').span()
+   (0, 10)
+
+Groups indicated with ``'('``, ``')'`` also capture the starting and ending
+index of the text that they match; this can be retrieved by passing an argument
+to :meth:`group`, :meth:`start`, :meth:`end`, and :meth:`span`.  Groups are
+numbered starting with 0.  Group 0 is always present; it's the whole RE, so
+:class:`MatchObject` methods all have group 0 as their default argument.  Later
+we'll see how to express groups that don't capture the span of text that they
+match. ::
+
+   >>> p = re.compile('(a)b')
+   >>> m = p.match('ab')
+   >>> m.group()
+   'ab'
+   >>> m.group(0)
+   'ab'
+
+Subgroups are numbered from left to right, from 1 upward.  Groups can be nested;
+to determine the number, just count the opening parenthesis characters, going
+from left to right. ::
+
+   >>> p = re.compile('(a(b)c)d')
+   >>> m = p.match('abcd')
+   >>> m.group(0)
+   'abcd'
+   >>> m.group(1)
+   'abc'
+   >>> m.group(2)
+   'b'
+
+:meth:`group` can be passed multiple group numbers at a time, in which case it
+will return a tuple containing the corresponding values for those groups. ::
+
+   >>> m.group(2,1,2)
+   ('b', 'abc', 'b')
+
+The :meth:`groups` method returns a tuple containing the strings for all the
+subgroups, from 1 up to however many there are. ::
+
+   >>> m.groups()
+   ('abc', 'b')
+
+Backreferences in a pattern allow you to specify that the contents of an earlier
+capturing group must also be found at the current location in the string.  For
+example, ``\1`` will succeed if the exact contents of group 1 can be found at
+the current position, and fails otherwise.  Remember that Python's string
+literals also use a backslash followed by numbers to allow including arbitrary
+characters in a string, so be sure to use a raw string when incorporating
+backreferences in a RE.
+
+For example, the following RE detects doubled words in a string. ::
+
+   >>> p = re.compile(r'(\b\w+)\s+\1')
+   >>> p.search('Paris in the the spring').group()
+   'the the'
+
+Backreferences like this aren't often useful for just searching through a string
+--- there are few text formats which repeat data in this way --- but you'll soon
+find out that they're *very* useful when performing string substitutions.
+
+
+Non-capturing and Named Groups
+------------------------------
+
+Elaborate REs may use many groups, both to capture substrings of interest, and
+to group and structure the RE itself.  In complex REs, it becomes difficult to
+keep track of the group numbers.  There are two features which help with this
+problem.  Both of them use a common syntax for regular expression extensions, so
+we'll look at that first.
+
+Perl 5 added several additional features to standard regular expressions, and
+the Python :mod:`re` module supports most of them.   It would have been
+difficult to choose new single-keystroke metacharacters or new special sequences
+beginning with ``\`` to represent the new features without making Perl's regular
+expressions confusingly different from standard REs.  If you chose ``&`` as a
+new metacharacter, for example, old expressions would be assuming that ``&`` was
+a regular character and wouldn't have escaped it by writing ``\&`` or ``[&]``.
+
+The solution chosen by the Perl developers was to use ``(?...)`` as the
+extension syntax.  ``?`` immediately after a parenthesis was a syntax error
+because the ``?`` would have nothing to repeat, so this didn't introduce any
+compatibility problems.  The characters immediately after the ``?``  indicate
+what extension is being used, so ``(?=foo)`` is one thing (a positive lookahead
+assertion) and ``(?:foo)`` is something else (a non-capturing group containing
+the subexpression ``foo``).
+
+Python adds an extension syntax to Perl's extension syntax.  If the first
+character after the question mark is a ``P``, you know that it's an extension
+that's specific to Python.  Currently there are two such extensions:
+``(?P<name>...)`` defines a named group, and ``(?P=name)`` is a backreference to
+a named group.  If future versions of Perl 5 add similar features using a
+different syntax, the :mod:`re` module will be changed to support the new
+syntax, while preserving the Python-specific syntax for compatibility's sake.
+
+Now that we've looked at the general extension syntax, we can return to the
+features that simplify working with groups in complex REs. Since groups are
+numbered from left to right and a complex expression may use many groups, it can
+become difficult to keep track of the correct numbering.  Modifying such a
+complex RE is annoying, too: insert a new group near the beginning and you
+change the numbers of everything that follows it.
+
+Sometimes you'll want to use a group to collect a part of a regular expression,
+but aren't interested in retrieving the group's contents. You can make this fact
+explicit by using a non-capturing group: ``(?:...)``, where you can replace the
+``...`` with any other regular expression. ::
+
+   >>> m = re.match("([abc])+", "abc")
+   >>> m.groups()
+   ('c',)
+   >>> m = re.match("(?:[abc])+", "abc")
+   >>> m.groups()
+   ()
+
+Except for the fact that you can't retrieve the contents of what the group
+matched, a non-capturing group behaves exactly the same as a capturing group;
+you can put anything inside it, repeat it with a repetition metacharacter such
+as ``*``, and nest it within other groups (capturing or non-capturing).
+``(?:...)`` is particularly useful when modifying an existing pattern, since you
+can add new groups without changing how all the other groups are numbered.  It
+should be mentioned that there's no performance difference in searching between
+capturing and non-capturing groups; neither form is any faster than the other.
+
+A more significant feature is named groups: instead of referring to them by
+numbers, groups can be referenced by a name.
+
+The syntax for a named group is one of the Python-specific extensions:
+``(?P<name>...)``.  *name* is, obviously, the name of the group.  Named groups
+also behave exactly like capturing groups, and additionally associate a name
+with a group.  The :class:`MatchObject` methods that deal with capturing groups
+all accept either integers that refer to the group by number or strings that
+contain the desired group's name.  Named groups are still given numbers, so you
+can retrieve information about a group in two ways::
+
+   >>> p = re.compile(r'(?P<word>\b\w+\b)')
+   >>> m = p.search( '(((( Lots of punctuation )))' )
+   >>> m.group('word')
+   'Lots'
+   >>> m.group(1)
+   'Lots'
+
+Named groups are handy because they let you use easily-remembered names, instead
+of having to remember numbers.  Here's an example RE from the :mod:`imaplib`
+module::
+
+   InternalDate = re.compile(r'INTERNALDATE "'
+           r'(?P<day>[ 123][0-9])-(?P<mon>[A-Z][a-z][a-z])-'
+   	r'(?P<year>[0-9][0-9][0-9][0-9])'
+           r' (?P<hour>[0-9][0-9]):(?P<min>[0-9][0-9]):(?P<sec>[0-9][0-9])'
+           r' (?P<zonen>[-+])(?P<zoneh>[0-9][0-9])(?P<zonem>[0-9][0-9])'
+           r'"')
+
+It's obviously much easier to retrieve ``m.group('zonem')``, instead of having
+to remember to retrieve group 9.
+
+The syntax for backreferences in an expression such as ``(...)\1`` refers to the
+number of the group.  There's naturally a variant that uses the group name
+instead of the number. This is another Python extension: ``(?P=name)`` indicates
+that the contents of the group called *name* should again be matched at the
+current point.  The regular expression for finding doubled words,
+``(\b\w+)\s+\1`` can also be written as ``(?P<word>\b\w+)\s+(?P=word)``::
+
+   >>> p = re.compile(r'(?P<word>\b\w+)\s+(?P=word)')
+   >>> p.search('Paris in the the spring').group()
+   'the the'
+
+
+Lookahead Assertions
+--------------------
+
+Another zero-width assertion is the lookahead assertion.  Lookahead assertions
+are available in both positive and negative form, and  look like this:
+
+``(?=...)``
+   Positive lookahead assertion.  This succeeds if the contained regular
+   expression, represented here by ``...``, successfully matches at the current
+   location, and fails otherwise. But, once the contained expression has been
+   tried, the matching engine doesn't advance at all; the rest of the pattern is
+   tried right where the assertion started.
+
+``(?!...)``
+   Negative lookahead assertion.  This is the opposite of the positive assertion;
+   it succeeds if the contained expression *doesn't* match at the current position
+   in the string.
+
+To make this concrete, let's look at a case where a lookahead is useful.
+Consider a simple pattern to match a filename and split it apart into a base
+name and an extension, separated by a ``.``.  For example, in ``news.rc``,
+``news`` is the base name, and ``rc`` is the filename's extension.
+
+The pattern to match this is quite simple:
+
+``.*[.].*$``
+
+Notice that the ``.`` needs to be treated specially because it's a
+metacharacter; I've put it inside a character class.  Also notice the trailing
+``$``; this is added to ensure that all the rest of the string must be included
+in the extension.  This regular expression matches ``foo.bar`` and
+``autoexec.bat`` and ``sendmail.cf`` and ``printers.conf``.
+
+Now, consider complicating the problem a bit; what if you want to match
+filenames where the extension is not ``bat``? Some incorrect attempts:
+
+``.*[.][^b].*$``  The first attempt above tries to exclude ``bat`` by requiring
+that the first character of the extension is not a ``b``.  This is wrong,
+because the pattern also doesn't match ``foo.bar``.
+
+.. % $
+
+``.*[.]([^b]..|.[^a].|..[^t])$``
+
+.. % Messes up the HTML without the curly braces around \^
+
+The expression gets messier when you try to patch up the first solution by
+requiring one of the following cases to match: the first character of the
+extension isn't ``b``; the second character isn't ``a``; or the third character
+isn't ``t``.  This accepts ``foo.bar`` and rejects ``autoexec.bat``, but it
+requires a three-letter extension and won't accept a filename with a two-letter
+extension such as ``sendmail.cf``.  We'll complicate the pattern again in an
+effort to fix it.
+
+``.*[.]([^b].?.?|.[^a]?.?|..?[^t]?)$``
+
+In the third attempt, the second and third letters are all made optional in
+order to allow matching extensions shorter than three characters, such as
+``sendmail.cf``.
+
+The pattern's getting really complicated now, which makes it hard to read and
+understand.  Worse, if the problem changes and you want to exclude both ``bat``
+and ``exe`` as extensions, the pattern would get even more complicated and
+confusing.
+
+A negative lookahead cuts through all this confusion:
+
+``.*[.](?!bat$).*$``  The negative lookahead means: if the expression ``bat``
+doesn't match at this point, try the rest of the pattern; if ``bat$`` does
+match, the whole pattern will fail.  The trailing ``$`` is required to ensure
+that something like ``sample.batch``, where the extension only starts with
+``bat``, will be allowed.
+
+.. % $
+
+Excluding another filename extension is now easy; simply add it as an
+alternative inside the assertion.  The following pattern excludes filenames that
+end in either ``bat`` or ``exe``:
+
+``.*[.](?!bat$|exe$).*$``
+
+.. % $
+
+
+Modifying Strings
+=================
+
+Up to this point, we've simply performed searches against a static string.
+Regular expressions are also commonly used to modify strings in various ways,
+using the following :class:`RegexObject` methods:
+
++------------------+-----------------------------------------------+
+| Method/Attribute | Purpose                                       |
++==================+===============================================+
+| ``split()``      | Split the string into a list, splitting it    |
+|                  | wherever the RE matches                       |
++------------------+-----------------------------------------------+
+| ``sub()``        | Find all substrings where the RE matches, and |
+|                  | replace them with a different string          |
++------------------+-----------------------------------------------+
+| ``subn()``       | Does the same thing as :meth:`sub`,  but      |
+|                  | returns the new string and the number of      |
+|                  | replacements                                  |
++------------------+-----------------------------------------------+
+
+
+Splitting Strings
+-----------------
+
+The :meth:`split` method of a :class:`RegexObject` splits a string apart
+wherever the RE matches, returning a list of the pieces. It's similar to the
+:meth:`split` method of strings but provides much more generality in the
+delimiters that you can split by; :meth:`split` only supports splitting by
+whitespace or by a fixed string.  As you'd expect, there's a module-level
+:func:`re.split` function, too.
+
+
+.. method:: .split(string [, maxsplit\ ``= 0``])
+   :noindex:
+
+   Split *string* by the matches of the regular expression.  If capturing
+   parentheses are used in the RE, then their contents will also be returned as
+   part of the resulting list.  If *maxsplit* is nonzero, at most *maxsplit* splits
+   are performed.
+
+You can limit the number of splits made, by passing a value for *maxsplit*.
+When *maxsplit* is nonzero, at most *maxsplit* splits will be made, and the
+remainder of the string is returned as the final element of the list.  In the
+following example, the delimiter is any sequence of non-alphanumeric characters.
+::
+
+   >>> p = re.compile(r'\W+')
+   >>> p.split('This is a test, short and sweet, of split().')
+   ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
+   >>> p.split('This is a test, short and sweet, of split().', 3)
+   ['This', 'is', 'a', 'test, short and sweet, of split().']
+
+Sometimes you're not only interested in what the text between delimiters is, but
+also need to know what the delimiter was.  If capturing parentheses are used in
+the RE, then their values are also returned as part of the list.  Compare the
+following calls::
+
+   >>> p = re.compile(r'\W+')
+   >>> p2 = re.compile(r'(\W+)')
+   >>> p.split('This... is a test.')
+   ['This', 'is', 'a', 'test', '']
+   >>> p2.split('This... is a test.')
+   ['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', '']
+
+The module-level function :func:`re.split` adds the RE to be used as the first
+argument, but is otherwise the same.   ::
+
+   >>> re.split('[\W]+', 'Words, words, words.')
+   ['Words', 'words', 'words', '']
+   >>> re.split('([\W]+)', 'Words, words, words.')
+   ['Words', ', ', 'words', ', ', 'words', '.', '']
+   >>> re.split('[\W]+', 'Words, words, words.', 1)
+   ['Words', 'words, words.']
+
+
+Search and Replace
+------------------
+
+Another common task is to find all the matches for a pattern, and replace them
+with a different string.  The :meth:`sub` method takes a replacement value,
+which can be either a string or a function, and the string to be processed.
+
+
+.. method:: .sub(replacement, string[, count\ ``= 0``])
+   :noindex:
+
+   Returns the string obtained by replacing the leftmost non-overlapping
+   occurrences of the RE in *string* by the replacement *replacement*.  If the
+   pattern isn't found, *string* is returned unchanged.
+
+   The optional argument *count* is the maximum number of pattern occurrences to be
+   replaced; *count* must be a non-negative integer.  The default value of 0 means
+   to replace all occurrences.
+
+Here's a simple example of using the :meth:`sub` method.  It replaces colour
+names with the word ``colour``::
+
+   >>> p = re.compile( '(blue|white|red)')
+   >>> p.sub( 'colour', 'blue socks and red shoes')
+   'colour socks and colour shoes'
+   >>> p.sub( 'colour', 'blue socks and red shoes', count=1)
+   'colour socks and red shoes'
+
+The :meth:`subn` method does the same work, but returns a 2-tuple containing the
+new string value and the number of replacements  that were performed::
+
+   >>> p = re.compile( '(blue|white|red)')
+   >>> p.subn( 'colour', 'blue socks and red shoes')
+   ('colour socks and colour shoes', 2)
+   >>> p.subn( 'colour', 'no colours at all')
+   ('no colours at all', 0)
+
+Empty matches are replaced only when they're not adjacent to a previous match.
+::
+
+   >>> p = re.compile('x*')
+   >>> p.sub('-', 'abxd')
+   '-a-b-d-'
+
+If *replacement* is a string, any backslash escapes in it are processed.  That
+is, ``\n`` is converted to a single newline character, ``\r`` is converted to a
+carriage return, and so forth. Unknown escapes such as ``\j`` are left alone.
+Backreferences, such as ``\6``, are replaced with the substring matched by the
+corresponding group in the RE.  This lets you incorporate portions of the
+original text in the resulting replacement string.
+
+This example matches the word ``section`` followed by a string enclosed in
+``{``, ``}``, and changes ``section`` to ``subsection``::
+
+   >>> p = re.compile('section{ ( [^}]* ) }', re.VERBOSE)
+   >>> p.sub(r'subsection{\1}','section{First} section{second}')
+   'subsection{First} subsection{second}'
+
+There's also a syntax for referring to named groups as defined by the
+``(?P<name>...)`` syntax.  ``\g<name>`` will use the substring matched by the
+group named ``name``, and  ``\g<number>``  uses the corresponding group number.
+``\g<2>`` is therefore equivalent to ``\2``,  but isn't ambiguous in a
+replacement string such as ``\g<2>0``.  (``\20`` would be interpreted as a
+reference to group 20, not a reference to group 2 followed by the literal
+character ``'0'``.)  The following substitutions are all equivalent, but use all
+three variations of the replacement string. ::
+
+   >>> p = re.compile('section{ (?P<name> [^}]* ) }', re.VERBOSE)
+   >>> p.sub(r'subsection{\1}','section{First}')
+   'subsection{First}'
+   >>> p.sub(r'subsection{\g<1>}','section{First}')
+   'subsection{First}'
+   >>> p.sub(r'subsection{\g<name>}','section{First}')
+   'subsection{First}'
+
+*replacement* can also be a function, which gives you even more control.  If
+*replacement* is a function, the function is called for every non-overlapping
+occurrence of *pattern*.  On each call, the function is  passed a
+:class:`MatchObject` argument for the match and can use this information to
+compute the desired replacement string and return it.
+
+In the following example, the replacement function translates  decimals into
+hexadecimal::
+
+   >>> def hexrepl( match ):
+   ...     "Return the hex string for a decimal number"
+   ...     value = int( match.group() )
+   ...     return hex(value)
+   ...
+   >>> p = re.compile(r'\d+')
+   >>> p.sub(hexrepl, 'Call 65490 for printing, 49152 for user code.')
+   'Call 0xffd2 for printing, 0xc000 for user code.'
+
+When using the module-level :func:`re.sub` function, the pattern is passed as
+the first argument.  The pattern may be a string or a :class:`RegexObject`; if
+you need to specify regular expression flags, you must either use a
+:class:`RegexObject` as the first parameter, or use embedded modifiers in the
+pattern, e.g.  ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``.
+
+
+Common Problems
+===============
+
+Regular expressions are a powerful tool for some applications, but in some ways
+their behaviour isn't intuitive and at times they don't behave the way you may
+expect them to.  This section will point out some of the most common pitfalls.
+
+
+Use String Methods
+------------------
+
+Sometimes using the :mod:`re` module is a mistake.  If you're matching a fixed
+string, or a single character class, and you're not using any :mod:`re` features
+such as the :const:`IGNORECASE` flag, then the full power of regular expressions
+may not be required. Strings have several methods for performing operations with
+fixed strings and they're usually much faster, because the implementation is a
+single small C loop that's been optimized for the purpose, instead of the large,
+more generalized regular expression engine.
+
+One example might be replacing a single fixed string with another one; for
+example, you might replace ``word`` with ``deed``.  ``re.sub()`` seems like the
+function to use for this, but consider the :meth:`replace` method.  Note that
+:func:`replace` will also replace ``word`` inside words, turning ``swordfish``
+into ``sdeedfish``, but the  naive RE ``word`` would have done that, too.  (To
+avoid performing the substitution on parts of words, the pattern would have to
+be ``\bword\b``, in order to require that ``word`` have a word boundary on
+either side.  This takes the job beyond  :meth:`replace`'s abilities.)
+
+Another common task is deleting every occurrence of a single character from a
+string or replacing it with another single character.  You might do this with
+something like ``re.sub('\n', ' ', S)``, but :meth:`translate` is capable of
+doing both tasks and will be faster than any regular expression operation can
+be.
+
+In short, before turning to the :mod:`re` module, consider whether your problem
+can be solved with a faster and simpler string method.
+
+
+match() versus search()
+-----------------------
+
+The :func:`match` function only checks if the RE matches at the beginning of the
+string while :func:`search` will scan forward through the string for a match.
+It's important to keep this distinction in mind.  Remember,  :func:`match` will
+only report a successful match which will start at 0; if the match wouldn't
+start at zero,  :func:`match` will *not* report it. ::
+
+   >>> print re.match('super', 'superstition').span()  
+   (0, 5)
+   >>> print re.match('super', 'insuperable')    
+   None
+
+On the other hand, :func:`search` will scan forward through the string,
+reporting the first match it finds. ::
+
+   >>> print re.search('super', 'superstition').span()
+   (0, 5)
+   >>> print re.search('super', 'insuperable').span()
+   (2, 7)
+
+Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*``
+to the front of your RE.  Resist this temptation and use :func:`re.search`
+instead.  The regular expression compiler does some analysis of REs in order to
+speed up the process of looking for a match.  One such analysis figures out what
+the first character of a match must be; for example, a pattern starting with
+``Crow`` must match starting with a ``'C'``.  The analysis lets the engine
+quickly scan through the string looking for the starting character, only trying
+the full match if a ``'C'`` is found.
+
+Adding ``.*`` defeats this optimization, requiring scanning to the end of the
+string and then backtracking to find a match for the rest of the RE.  Use
+:func:`re.search` instead.
+
+
+Greedy versus Non-Greedy
+------------------------
+
+When repeating a regular expression, as in ``a*``, the resulting action is to
+consume as much of the pattern as possible.  This fact often bites you when
+you're trying to match a pair of balanced delimiters, such as the angle brackets
+surrounding an HTML tag.  The naive pattern for matching a single HTML tag
+doesn't work because of the greedy nature of ``.*``. ::
+
+   >>> s = '<html><head><title>Title</title>'
+   >>> len(s)
+   32
+   >>> print re.match('<.*>', s).span()
+   (0, 32)
+   >>> print re.match('<.*>', s).group()
+   <html><head><title>Title</title>
+
+The RE matches the ``'<'`` in ``<html>``, and the ``.*`` consumes the rest of
+the string.  There's still more left in the RE, though, and the ``>`` can't
+match at the end of the string, so the regular expression engine has to
+backtrack character by character until it finds a match for the ``>``.   The
+final match extends from the ``'<'`` in ``<html>`` to the ``'>'`` in
+``</title>``, which isn't what you want.
+
+In this case, the solution is to use the non-greedy qualifiers ``*?``, ``+?``,
+``??``, or ``{m,n}?``, which match as *little* text as possible.  In the above
+example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
+when it fails, the engine advances a character at a time, retrying the ``'>'``
+at every step.  This produces just the right result::
+
+   >>> print re.match('<.*?>', s).group()
+   <html>
+
+(Note that parsing HTML or XML with regular expressions is painful. Quick-and-
+dirty patterns will handle common cases, but HTML and XML have special cases
+that will break the obvious regular expression; by the time you've written a
+regular expression that handles all of the possible cases, the patterns will be
+*very* complicated.  Use an HTML or XML parser module for such tasks.)
+
+
+Not Using re.VERBOSE
+--------------------
+
+By now you've probably noticed that regular expressions are a very compact
+notation, but they're not terribly readable.  REs of moderate complexity can
+become lengthy collections of backslashes, parentheses, and metacharacters,
+making them difficult to read and understand.
+
+For such REs, specifying the ``re.VERBOSE`` flag when compiling the regular
+expression can be helpful, because it allows you to format the regular
+expression more clearly.
+
+The ``re.VERBOSE`` flag has several effects.  Whitespace in the regular
+expression that *isn't* inside a character class is ignored.  This means that an
+expression such as ``dog | cat`` is equivalent to the less readable ``dog|cat``,
+but ``[a b]`` will still match the characters ``'a'``, ``'b'``, or a space.  In
+addition, you can also put comments inside a RE; comments extend from a ``#``
+character to the next newline.  When used with triple-quoted strings, this
+enables REs to be formatted more neatly::
+
+   pat = re.compile(r"""
+    \s*                 # Skip leading whitespace
+    (?P<header>[^:]+)   # Header name
+    \s* :               # Whitespace, and a colon
+    (?P<value>.*?)      # The header's value -- *? used to
+                        # lose the following trailing whitespace
+    \s*$                # Trailing whitespace to end-of-line
+   """, re.VERBOSE)
+
+This is far more readable than:
+
+.. % $
+
+::
+
+   pat = re.compile(r"\s*(?P<header>[^:]+)\s*:(?P<value>.*?)\s*$")
+
+.. % $
+
+
+Feedback
+========
+
+Regular expressions are a complicated topic.  Did this document help you
+understand them?  Were there parts that were unclear, or Problems you
+encountered that weren't covered here?  If so, please send suggestions for
+improvements to the author.
+
+The most complete book on regular expressions is almost certainly Jeffrey
+Friedl's Mastering Regular Expressions, published by O'Reilly.  Unfortunately,
+it exclusively concentrates on Perl and Java's flavours of regular expressions,
+and doesn't contain any Python material at all, so it won't be useful as a
+reference for programming in Python.  (The first edition covered Python's now-
+removed :mod:`regex` module, which won't help you much.)  Consider checking it
+out from your library.
+
+
+.. rubric:: Footnotes
+
+.. [#] Introduced in Python 2.2.2.
+

Added: doctools/trunk/Doc-26/howto/sockets.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/sockets.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,421 @@
+****************************
+  Socket Programming HOWTO  
+****************************
+
+:Author: Gordon McMillan
+
+
+.. topic:: Abstract
+
+   Sockets are used nearly everywhere, but are one of the most severely
+   misunderstood technologies around. This is a 10,000 foot overview of sockets.
+   It's not really a tutorial - you'll still have work to do in getting things
+   operational. It doesn't cover the fine points (and there are a lot of them), but
+   I hope it will give you enough background to begin using them decently.
+
+
+Sockets
+=======
+
+Sockets are used nearly everywhere, but are one of the most severely
+misunderstood technologies around. This is a 10,000 foot overview of sockets.
+It's not really a tutorial - you'll still have work to do in getting things
+working. It doesn't cover the fine points (and there are a lot of them), but I
+hope it will give you enough background to begin using them decently.
+
+I'm only going to talk about INET sockets, but they account for at least 99% of
+the sockets in use. And I'll only talk about STREAM sockets - unless you really
+know what you're doing (in which case this HOWTO isn't for you!), you'll get
+better behavior and performance from a STREAM socket than anything else. I will
+try to clear up the mystery of what a socket is, as well as some hints on how to
+work with blocking and non-blocking sockets. But I'll start by talking about
+blocking sockets. You'll need to know how they work before dealing with non-
+blocking sockets.
+
+Part of the trouble with understanding these things is that "socket" can mean a
+number of subtly different things, depending on context. So first, let's make a
+distinction between a "client" socket - an endpoint of a conversation, and a
+"server" socket, which is more like a switchboard operator. The client
+application (your browser, for example) uses "client" sockets exclusively; the
+web server it's talking to uses both "server" sockets and "client" sockets.
+
+
+History
+-------
+
+Of the various forms of IPC (*Inter Process Communication*), sockets are by far
+the most popular.  On any given platform, there are likely to be other forms of
+IPC that are faster, but for cross-platform communication, sockets are about the
+only game in town.
+
+They were invented in Berkeley as part of the BSD flavor of Unix. They spread
+like wildfire with the Internet. With good reason --- the combination of sockets
+with INET makes talking to arbitrary machines around the world unbelievably easy
+(at least compared to other schemes).
+
+
+Creating a Socket
+=================
+
+Roughly speaking, when you clicked on the link that brought you to this page,
+your browser did something like the following::
+
+   #create an INET, STREAMing socket
+   s = socket.socket(
+       socket.AF_INET, socket.SOCK_STREAM)
+   #now connect to the web server on port 80 
+   # - the normal http port
+   s.connect(("www.mcmillan-inc.com", 80))
+
+When the ``connect`` completes, the socket ``s`` can now be used to send in a
+request for the text of this page. The same socket will read the reply, and then
+be destroyed. That's right - destroyed. Client sockets are normally only used
+for one exchange (or a small set of sequential exchanges).
+
+What happens in the web server is a bit more complex. First, the web server
+creates a "server socket". ::
+
+   #create an INET, STREAMing socket
+   serversocket = socket.socket(
+       socket.AF_INET, socket.SOCK_STREAM)
+   #bind the socket to a public host, 
+   # and a well-known port
+   serversocket.bind((socket.gethostname(), 80))
+   #become a server socket
+   serversocket.listen(5)
+
+A couple things to notice: we used ``socket.gethostname()`` so that the socket
+would be visible to the outside world. If we had used ``s.bind(('', 80))`` or
+``s.bind(('localhost', 80))`` or ``s.bind(('127.0.0.1', 80))`` we would still
+have a "server" socket, but one that was only visible within the same machine.
+
+A second thing to note: low number ports are usually reserved for "well known"
+services (HTTP, SNMP etc). If you're playing around, use a nice high number (4
+digits).
+
+Finally, the argument to ``listen`` tells the socket library that we want it to
+queue up as many as 5 connect requests (the normal max) before refusing outside
+connections. If the rest of the code is written properly, that should be plenty.
+
+OK, now we have a "server" socket, listening on port 80. Now we enter the
+mainloop of the web server::
+
+   while 1:
+       #accept connections from outside
+       (clientsocket, address) = serversocket.accept()
+       #now do something with the clientsocket
+       #in this case, we'll pretend this is a threaded server
+       ct = client_thread(clientsocket)
+       ct.run()
+
+There's actually 3 general ways in which this loop could work - dispatching a
+thread to handle ``clientsocket``, create a new process to handle
+``clientsocket``, or restructure this app to use non-blocking sockets, and
+mulitplex between our "server" socket and any active ``clientsocket``\ s using
+``select``. More about that later. The important thing to understand now is
+this: this is *all* a "server" socket does. It doesn't send any data. It doesn't
+receive any data. It just produces "client" sockets. Each ``clientsocket`` is
+created in response to some *other* "client" socket doing a ``connect()`` to the
+host and port we're bound to. As soon as we've created that ``clientsocket``, we
+go back to listening for more connections. The two "clients" are free to chat it
+up - they are using some dynamically allocated port which will be recycled when
+the conversation ends.
+
+
+IPC
+---
+
+If you need fast IPC between two processes on one machine, you should look into
+whatever form of shared memory the platform offers. A simple protocol based
+around shared memory and locks or semaphores is by far the fastest technique.
+
+If you do decide to use sockets, bind the "server" socket to ``'localhost'``. On
+most platforms, this will take a shortcut around a couple of layers of network
+code and be quite a bit faster.
+
+
+Using a Socket
+==============
+
+The first thing to note, is that the web browser's "client" socket and the web
+server's "client" socket are identical beasts. That is, this is a "peer to peer"
+conversation. Or to put it another way, *as the designer, you will have to
+decide what the rules of etiquette are for a conversation*. Normally, the
+``connect``\ ing socket starts the conversation, by sending in a request, or
+perhaps a signon. But that's a design decision - it's not a rule of sockets.
+
+Now there are two sets of verbs to use for communication. You can use ``send``
+and ``recv``, or you can transform your client socket into a file-like beast and
+use ``read`` and ``write``. The latter is the way Java presents their sockets.
+I'm not going to talk about it here, except to warn you that you need to use
+``flush`` on sockets. These are buffered "files", and a common mistake is to
+``write`` something, and then ``read`` for a reply. Without a ``flush`` in
+there, you may wait forever for the reply, because the request may still be in
+your output buffer.
+
+Now we come the major stumbling block of sockets - ``send`` and ``recv`` operate
+on the network buffers. They do not necessarily handle all the bytes you hand
+them (or expect from them), because their major focus is handling the network
+buffers. In general, they return when the associated network buffers have been
+filled (``send``) or emptied (``recv``). They then tell you how many bytes they
+handled. It is *your* responsibility to call them again until your message has
+been completely dealt with.
+
+When a ``recv`` returns 0 bytes, it means the other side has closed (or is in
+the process of closing) the connection.  You will not receive any more data on
+this connection. Ever.  You may be able to send data successfully; I'll talk
+about that some on the next page.
+
+A protocol like HTTP uses a socket for only one transfer. The client sends a
+request, the reads a reply.  That's it. The socket is discarded. This means that
+a client can detect the end of the reply by receiving 0 bytes.
+
+But if you plan to reuse your socket for further transfers, you need to realize
+that *there is no "EOT" (End of Transfer) on a socket.* I repeat: if a socket
+``send`` or ``recv`` returns after handling 0 bytes, the connection has been
+broken.  If the connection has *not* been broken, you may wait on a ``recv``
+forever, because the socket will *not* tell you that there's nothing more to
+read (for now).  Now if you think about that a bit, you'll come to realize a
+fundamental truth of sockets: *messages must either be fixed length* (yuck), *or
+be delimited* (shrug), *or indicate how long they are* (much better), *or end by
+shutting down the connection*. The choice is entirely yours, (but some ways are
+righter than others).
+
+Assuming you don't want to end the connection, the simplest solution is a fixed
+length message::
+
+   class mysocket:
+       '''demonstration class only 
+         - coded for clarity, not efficiency
+       '''
+
+       def __init__(self, sock=None):
+   	if sock is None:
+   	    self.sock = socket.socket(
+   		socket.AF_INET, socket.SOCK_STREAM)
+   	else:
+   	    self.sock = sock
+
+       def connect(self, host, port):
+   	self.sock.connect((host, port))
+
+       def mysend(self, msg):
+   	totalsent = 0
+   	while totalsent < MSGLEN:
+   	    sent = self.sock.send(msg[totalsent:])
+   	    if sent == 0:
+   		raise RuntimeError, \\
+   		    "socket connection broken"
+   	    totalsent = totalsent + sent
+
+       def myreceive(self):
+   	msg = ''
+   	while len(msg) < MSGLEN:
+   	    chunk = self.sock.recv(MSGLEN-len(msg))
+   	    if chunk == '':
+   		raise RuntimeError, \\
+   		    "socket connection broken"
+   	    msg = msg + chunk
+   	return msg
+
+The sending code here is usable for almost any messaging scheme - in Python you
+send strings, and you can use ``len()`` to determine its length (even if it has
+embedded ``\0`` characters). It's mostly the receiving code that gets more
+complex. (And in C, it's not much worse, except you can't use ``strlen`` if the
+message has embedded ``\0``\ s.)
+
+The easiest enhancement is to make the first character of the message an
+indicator of message type, and have the type determine the length. Now you have
+two ``recv``\ s - the first to get (at least) that first character so you can
+look up the length, and the second in a loop to get the rest. If you decide to
+go the delimited route, you'll be receiving in some arbitrary chunk size, (4096
+or 8192 is frequently a good match for network buffer sizes), and scanning what
+you've received for a delimiter.
+
+One complication to be aware of: if your conversational protocol allows multiple
+messages to be sent back to back (without some kind of reply), and you pass
+``recv`` an arbitrary chunk size, you may end up reading the start of a
+following message. You'll need to put that aside and hold onto it, until it's
+needed.
+
+Prefixing the message with it's length (say, as 5 numeric characters) gets more
+complex, because (believe it or not), you may not get all 5 characters in one
+``recv``. In playing around, you'll get away with it; but in high network loads,
+your code will very quickly break unless you use two ``recv`` loops - the first
+to determine the length, the second to get the data part of the message. Nasty.
+This is also when you'll discover that ``send`` does not always manage to get
+rid of everything in one pass. And despite having read this, you will eventually
+get bit by it!
+
+In the interests of space, building your character, (and preserving my
+competitive position), these enhancements are left as an exercise for the
+reader. Lets move on to cleaning up.
+
+
+Binary Data
+-----------
+
+It is perfectly possible to send binary data over a socket. The major problem is
+that not all machines use the same formats for binary data. For example, a
+Motorola chip will represent a 16 bit integer with the value 1 as the two hex
+bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
+Socket libraries have calls for converting 16 and 32 bit integers - ``ntohl,
+htonl, ntohs, htons`` where "n" means *network* and "h" means *host*, "s" means
+*short* and "l" means *long*. Where network order is host order, these do
+nothing, but where the machine is byte-reversed, these swap the bytes around
+appropriately.
+
+In these days of 32 bit machines, the ascii representation of binary data is
+frequently smaller than the binary representation. That's because a surprising
+amount of the time, all those longs have the value 0, or maybe 1. The string "0"
+would be two bytes, while binary is four. Of course, this doesn't fit well with
+fixed-length messages. Decisions, decisions.
+
+
+Disconnecting
+=============
+
+Strictly speaking, you're supposed to use ``shutdown`` on a socket before you
+``close`` it.  The ``shutdown`` is an advisory to the socket at the other end.
+Depending on the argument you pass it, it can mean "I'm not going to send
+anymore, but I'll still listen", or "I'm not listening, good riddance!".  Most
+socket libraries, however, are so used to programmers neglecting to use this
+piece of etiquette that normally a ``close`` is the same as ``shutdown();
+close()``.  So in most situations, an explicit ``shutdown`` is not needed.
+
+One way to use ``shutdown`` effectively is in an HTTP-like exchange. The client
+sends a request and then does a ``shutdown(1)``. This tells the server "This
+client is done sending, but can still receive."  The server can detect "EOF" by
+a receive of 0 bytes. It can assume it has the complete request.  The server
+sends a reply. If the ``send`` completes successfully then, indeed, the client
+was still receiving.
+
+Python takes the automatic shutdown a step further, and says that when a socket
+is garbage collected, it will automatically do a ``close`` if it's needed. But
+relying on this is a very bad habit. If your socket just disappears without
+doing a ``close``, the socket at the other end may hang indefinitely, thinking
+you're just being slow. *Please* ``close`` your sockets when you're done.
+
+
+When Sockets Die
+----------------
+
+Probably the worst thing about using blocking sockets is what happens when the
+other side comes down hard (without doing a ``close``). Your socket is likely to
+hang. SOCKSTREAM is a reliable protocol, and it will wait a long, long time
+before giving up on a connection. If you're using threads, the entire thread is
+essentially dead. There's not much you can do about it. As long as you aren't
+doing something dumb, like holding a lock while doing a blocking read, the
+thread isn't really consuming much in the way of resources. Do *not* try to kill
+the thread - part of the reason that threads are more efficient than processes
+is that they avoid the overhead associated with the automatic recycling of
+resources. In other words, if you do manage to kill the thread, your whole
+process is likely to be screwed up.
+
+
+Non-blocking Sockets
+====================
+
+If you've understood the preceeding, you already know most of what you need to
+know about the mechanics of using sockets. You'll still use the same calls, in
+much the same ways. It's just that, if you do it right, your app will be almost
+inside-out.
+
+In Python, you use ``socket.setblocking(0)`` to make it non-blocking. In C, it's
+more complex, (for one thing, you'll need to choose between the BSD flavor
+``O_NONBLOCK`` and the almost indistinguishable Posix flavor ``O_NDELAY``, which
+is completely different from ``TCP_NODELAY``), but it's the exact same idea. You
+do this after creating the socket, but before using it. (Actually, if you're
+nuts, you can switch back and forth.)
+
+The major mechanical difference is that ``send``, ``recv``, ``connect`` and
+``accept`` can return without having done anything. You have (of course) a
+number of choices. You can check return code and error codes and generally drive
+yourself crazy. If you don't believe me, try it sometime. Your app will grow
+large, buggy and suck CPU. So let's skip the brain-dead solutions and do it
+right.
+
+Use ``select``.
+
+In C, coding ``select`` is fairly complex. In Python, it's a piece of cake, but
+it's close enough to the C version that if you understand ``select`` in Python,
+you'll have little trouble with it in C. ::
+
+   ready_to_read, ready_to_write, in_error = \\
+                  select.select(
+                     potential_readers, 
+                     potential_writers, 
+                     potential_errs, 
+                     timeout)
+
+You pass ``select`` three lists: the first contains all sockets that you might
+want to try reading; the second all the sockets you might want to try writing
+to, and the last (normally left empty) those that you want to check for errors.
+You should note that a socket can go into more than one list. The ``select``
+call is blocking, but you can give it a timeout. This is generally a sensible
+thing to do - give it a nice long timeout (say a minute) unless you have good
+reason to do otherwise.
+
+In return, you will get three lists. They have the sockets that are actually
+readable, writable and in error. Each of these lists is a subset (possbily
+empty) of the corresponding list you passed in. And if you put a socket in more
+than one input list, it will only be (at most) in one output list.
+
+If a socket is in the output readable list, you can be as-close-to-certain-as-
+we-ever-get-in-this-business that a ``recv`` on that socket will return
+*something*. Same idea for the writable list. You'll be able to send
+*something*. Maybe not all you want to, but *something* is better than nothing.
+(Actually, any reasonably healthy socket will return as writable - it just means
+outbound network buffer space is available.)
+
+If you have a "server" socket, put it in the potential_readers list. If it comes
+out in the readable list, your ``accept`` will (almost certainly) work. If you
+have created a new socket to ``connect`` to someone else, put it in the
+ptoential_writers list. If it shows up in the writable list, you have a decent
+chance that it has connected.
+
+One very nasty problem with ``select``: if somewhere in those input lists of
+sockets is one which has died a nasty death, the ``select`` will fail. You then
+need to loop through every single damn socket in all those lists and do a
+``select([sock],[],[],0)`` until you find the bad one. That timeout of 0 means
+it won't take long, but it's ugly.
+
+Actually, ``select`` can be handy even with blocking sockets. It's one way of
+determining whether you will block - the socket returns as readable when there's
+something in the buffers.  However, this still doesn't help with the problem of
+determining whether the other end is done, or just busy with something else.
+
+**Portability alert**: On Unix, ``select`` works both with the sockets and
+files. Don't try this on Windows. On Windows, ``select`` works with sockets
+only. Also note that in C, many of the more advanced socket options are done
+differently on Windows. In fact, on Windows I usually use threads (which work
+very, very well) with my sockets. Face it, if you want any kind of performance,
+your code will look very different on Windows than on Unix. (I haven't the
+foggiest how you do this stuff on a Mac.)
+
+
+Performance
+-----------
+
+There's no question that the fastest sockets code uses non-blocking sockets and
+select to multiplex them. You can put together something that will saturate a
+LAN connection without putting any strain on the CPU. The trouble is that an app
+written this way can't do much of anything else - it needs to be ready to
+shuffle bytes around at all times.
+
+Assuming that your app is actually supposed to do something more than that,
+threading is the optimal solution, (and using non-blocking sockets will be
+faster than using blocking sockets). Unfortunately, threading support in Unixes
+varies both in API and quality. So the normal Unix solution is to fork a
+subprocess to deal with each connection. The overhead for this is significant
+(and don't do this on Windows - the overhead of process creation is enormous
+there). It also means that unless each subprocess is completely independent,
+you'll need to use another form of IPC, say a pipe, or shared memory and
+semaphores, to communicate between the parent and child processes.
+
+Finally, remember that even though blocking sockets are somewhat slower than
+non-blocking, in many cases they are the "right" solution. After all, if your
+app is driven by the data it receives over a socket, there's not much sense in
+complicating the logic just so your app can wait on ``select`` instead of
+``recv``.
+

Added: doctools/trunk/Doc-26/howto/unicode.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/unicode.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,732 @@
+*****************
+  Unicode HOWTO
+*****************
+
+:Release: 1.02
+
+This HOWTO discusses Python's support for Unicode, and explains various problems
+that people commonly encounter when trying to work with Unicode.
+
+Introduction to Unicode
+=======================
+
+History of Character Codes
+--------------------------
+
+In 1968, the American Standard Code for Information Interchange, better known by
+its acronym ASCII, was standardized.  ASCII defined numeric codes for various
+characters, with the numeric values running from 0 to
+127.  For example, the lowercase letter 'a' is assigned 97 as its code
+value.
+
+ASCII was an American-developed standard, so it only defined unaccented
+characters.  There was an 'e', but no 'é' or 'Í'.  This meant that languages
+which required accented characters couldn't be faithfully represented in ASCII.
+(Actually the missing accents matter for English, too, which contains words such
+as 'naïve' and 'café', and some publications have house styles which require
+spellings such as 'coöperate'.)
+
+For a while people just wrote programs that didn't display accents.  I remember
+looking at Apple ][ BASIC programs, published in French-language publications in
+the mid-1980s, that had lines like these::
+
+	PRINT "FICHER EST COMPLETE."
+	PRINT "CARACTERE NON ACCEPTE."
+
+Those messages should contain accents, and they just look wrong to someone who
+can read French.
+
+In the 1980s, almost all personal computers were 8-bit, meaning that bytes could
+hold values ranging from 0 to 255.  ASCII codes only went up to 127, so some
+machines assigned values between 128 and 255 to accented characters.  Different
+machines had different codes, however, which led to problems exchanging files.
+Eventually various commonly used sets of values for the 128-255 range emerged.
+Some were true standards, defined by the International Standards Organization,
+and some were **de facto** conventions that were invented by one company or
+another and managed to catch on.
+
+255 characters aren't very many.  For example, you can't fit both the accented
+characters used in Western Europe and the Cyrillic alphabet used for Russian
+into the 128-255 range because there are more than 127 such characters.
+
+You could write files using different codes (all your Russian files in a coding
+system called KOI8, all your French files in a different coding system called
+Latin1), but what if you wanted to write a French document that quotes some
+Russian text?  In the 1980s people began to want to solve this problem, and the
+Unicode standardization effort began.
+
+Unicode started out using 16-bit characters instead of 8-bit characters.  16
+bits means you have 2^16 = 65,536 distinct values available, making it possible
+to represent many different characters from many different alphabets; an initial
+goal was to have Unicode contain the alphabets for every single human language.
+It turns out that even 16 bits isn't enough to meet that goal, and the modern
+Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff in
+base-16).
+
+There's a related ISO standard, ISO 10646.  Unicode and ISO 10646 were
+originally separate efforts, but the specifications were merged with the 1.1
+revision of Unicode.
+
+(This discussion of Unicode's history is highly simplified.  I don't think the
+average Python programmer needs to worry about the historical details; consult
+the Unicode consortium site listed in the References for more information.)
+
+
+Definitions
+-----------
+
+A **character** is the smallest possible component of a text.  'A', 'B', 'C',
+etc., are all different characters.  So are 'È' and 'Í'.  Characters are
+abstractions, and vary depending on the language or context you're talking
+about.  For example, the symbol for ohms (Ω) is usually drawn much like the
+capital letter omega (Ω) in the Greek alphabet (they may even be the same in
+some fonts), but these are two different characters that have different
+meanings.
+
+The Unicode standard describes how characters are represented by **code
+points**.  A code point is an integer value, usually denoted in base 16.  In the
+standard, a code point is written using the notation U+12ca to mean the
+character with value 0x12ca (4810 decimal).  The Unicode standard contains a lot
+of tables listing characters and their corresponding code points::
+
+	0061    'a'; LATIN SMALL LETTER A
+	0062    'b'; LATIN SMALL LETTER B
+	0063    'c'; LATIN SMALL LETTER C
+        ...
+	007B	'{'; LEFT CURLY BRACKET
+
+Strictly, these definitions imply that it's meaningless to say 'this is
+character U+12ca'.  U+12ca is a code point, which represents some particular
+character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'.  In
+informal contexts, this distinction between code points and characters will
+sometimes be forgotten.
+
+A character is represented on a screen or on paper by a set of graphical
+elements that's called a **glyph**.  The glyph for an uppercase A, for example,
+is two diagonal strokes and a horizontal stroke, though the exact details will
+depend on the font being used.  Most Python code doesn't need to worry about
+glyphs; figuring out the correct glyph to display is generally the job of a GUI
+toolkit or a terminal's font renderer.
+
+
+Encodings
+---------
+
+To summarize the previous section: a Unicode string is a sequence of code
+points, which are numbers from 0 to 0x10ffff.  This sequence needs to be
+represented as a set of bytes (meaning, values from 0-255) in memory.  The rules
+for translating a Unicode string into a sequence of bytes are called an
+**encoding**.
+
+The first encoding you might think of is an array of 32-bit integers.  In this
+representation, the string "Python" would look like this::
+
+       P           y           t           h           o           n
+    0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 
+       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
+
+This representation is straightforward but using it presents a number of
+problems.
+
+1. It's not portable; different processors order the bytes differently.
+
+2. It's very wasteful of space.  In most texts, the majority of the code points
+   are less than 127, or less than 255, so a lot of space is occupied by zero
+   bytes.  The above string takes 24 bytes compared to the 6 bytes needed for an
+   ASCII representation.  Increased RAM usage doesn't matter too much (desktop
+   computers have megabytes of RAM, and strings aren't usually that large), but
+   expanding our usage of disk and network bandwidth by a factor of 4 is
+   intolerable.
+
+3. It's not compatible with existing C functions such as ``strlen()``, so a new
+   family of wide string functions would need to be used.
+
+4. Many Internet standards are defined in terms of textual data, and can't
+   handle content with embedded zero bytes.
+
+Generally people don't use this encoding, instead choosing other encodings that
+are more efficient and convenient.
+
+Encodings don't have to handle every possible Unicode character, and most
+encodings don't.  For example, Python's default encoding is the 'ascii'
+encoding.  The rules for converting a Unicode string into the ASCII encoding are
+simple; for each code point:
+
+1. If the code point is < 128, each byte is the same as the value of the code
+   point.
+
+2. If the code point is 128 or greater, the Unicode string can't be represented
+   in this encoding.  (Python raises a :exc:`UnicodeEncodeError` exception in this
+   case.)
+
+Latin-1, also known as ISO-8859-1, is a similar encoding.  Unicode code points
+0-255 are identical to the Latin-1 values, so converting to this encoding simply
+requires converting code points to byte values; if a code point larger than 255
+is encountered, the string can't be encoded into Latin-1.
+
+Encodings don't have to be simple one-to-one mappings like Latin-1.  Consider
+IBM's EBCDIC, which was used on IBM mainframes.  Letter values weren't in one
+block: 'a' through 'i' had values from 129 to 137, but 'j' through 'r' were 145
+through 153.  If you wanted to use EBCDIC as an encoding, you'd probably use
+some sort of lookup table to perform the conversion, but this is largely an
+internal detail.
+
+UTF-8 is one of the most commonly used encodings.  UTF stands for "Unicode
+Transformation Format", and the '8' means that 8-bit numbers are used in the
+encoding.  (There's also a UTF-16 encoding, but it's less frequently used than
+UTF-8.)  UTF-8 uses the following rules:
+
+1. If the code point is <128, it's represented by the corresponding byte value.
+2. If the code point is between 128 and 0x7ff, it's turned into two byte values
+   between 128 and 255.
+3. Code points >0x7ff are turned into three- or four-byte sequences, where each
+   byte of the sequence is between 128 and 255.
+    
+UTF-8 has several convenient properties:
+
+1. It can handle any Unicode code point.
+2. A Unicode string is turned into a string of bytes containing no embedded zero
+   bytes.  This avoids byte-ordering issues, and means UTF-8 strings can be
+   processed by C functions such as ``strcpy()`` and sent through protocols that
+   can't handle zero bytes.
+3. A string of ASCII text is also valid UTF-8 text.
+4. UTF-8 is fairly compact; the majority of code points are turned into two
+   bytes, and values less than 128 occupy only a single byte.
+5. If bytes are corrupted or lost, it's possible to determine the start of the
+   next UTF-8-encoded code point and resynchronize.  It's also unlikely that
+   random 8-bit data will look like valid UTF-8.
+
+
+
+References
+----------
+
+The Unicode Consortium site at <http://www.unicode.org> has character charts, a
+glossary, and PDF versions of the Unicode specification.  Be prepared for some
+difficult reading.  <http://www.unicode.org/history/> is a chronology of the
+origin and development of Unicode.
+
+To help understand the standard, Jukka Korpela has written an introductory guide
+to reading the Unicode character tables, available at
+<http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
+
+Roman Czyborra wrote another explanation of Unicode's basic principles; it's at
+<http://czyborra.com/unicode/characters.html>.  Czyborra has written a number of
+other Unicode-related documentation, available from <http://www.cyzborra.com>.
+
+Two other good introductory articles were written by Joel Spolsky
+<http://www.joelonsoftware.com/articles/Unicode.html> and Jason Orendorff
+<http://www.jorendorff.com/articles/unicode/>.  If this introduction didn't make
+things clear to you, you should try reading one of these alternate articles
+before continuing.
+
+Wikipedia entries are often helpful; see the entries for "character encoding"
+<http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
+<http://en.wikipedia.org/wiki/UTF-8>, for example.
+
+
+Python's Unicode Support
+========================
+
+Now that you've learned the rudiments of Unicode, we can look at Python's
+Unicode features.
+
+
+The Unicode Type
+----------------
+
+Unicode strings are expressed as instances of the :class:`unicode` type, one of
+Python's repertoire of built-in types.  It derives from an abstract type called
+:class:`basestring`, which is also an ancestor of the :class:`str` type; you can
+therefore check if a value is a string type with ``isinstance(value,
+basestring)``.  Under the hood, Python represents Unicode strings as either 16-
+or 32-bit integers, depending on how the Python interpreter was compiled.
+
+The :func:`unicode` constructor has the signature ``unicode(string[, encoding,
+errors])``.  All of its arguments should be 8-bit strings.  The first argument
+is converted to Unicode using the specified encoding; if you leave off the
+``encoding`` argument, the ASCII encoding is used for the conversion, so
+characters greater than 127 will be treated as errors::
+
+    >>> unicode('abcdef')
+    u'abcdef'
+    >>> s = unicode('abcdef')
+    >>> type(s)
+    <type 'unicode'>
+    >>> unicode('abcdef' + chr(255))
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: 
+                        ordinal not in range(128)
+
+The ``errors`` argument specifies the response when the input string can't be
+converted according to the encoding's rules.  Legal values for this argument are
+'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add U+FFFD,
+'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
+Unicode result).  The following examples show the differences::
+
+    >>> unicode('\x80abc', errors='strict')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: 
+                        ordinal not in range(128)
+    >>> unicode('\x80abc', errors='replace')
+    u'\ufffdabc'
+    >>> unicode('\x80abc', errors='ignore')
+    u'abc'
+
+Encodings are specified as strings containing the encoding's name.  Python 2.4
+comes with roughly 100 different encodings; see the Python Library Reference at
+<http://docs.python.org/lib/standard-encodings.html> for a list.  Some encodings
+have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
+synonyms for the same encoding.
+
+One-character Unicode strings can also be created with the :func:`unichr`
+built-in function, which takes integers and returns a Unicode string of length 1
+that contains the corresponding code point.  The reverse operation is the
+built-in :func:`ord` function that takes a one-character Unicode string and
+returns the code point value::
+
+    >>> unichr(40960)
+    u'\ua000'
+    >>> ord(u'\ua000')
+    40960
+
+Instances of the :class:`unicode` type have many of the same methods as the
+8-bit string type for operations such as searching and formatting::
+
+    >>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
+    >>> s.count('e')
+    5
+    >>> s.find('feather')
+    9
+    >>> s.find('bird')
+    -1
+    >>> s.replace('feather', 'sand')
+    u'Was ever sand so lightly blown to and fro as this multitude?'
+    >>> s.upper()
+    u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
+
+Note that the arguments to these methods can be Unicode strings or 8-bit
+strings.  8-bit strings will be converted to Unicode before carrying out the
+operation; Python's default ASCII encoding will be used, so characters greater
+than 127 will cause an exception::
+
+    >>> s.find('Was\x9f')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
+    >>> s.find(u'Was\x9f')
+    -1
+
+Much Python code that operates on strings will therefore work with Unicode
+strings without requiring any changes to the code.  (Input and output code needs
+more updating for Unicode; more on this later.)
+
+Another important method is ``.encode([encoding], [errors='strict'])``, which
+returns an 8-bit string version of the Unicode string, encoded in the requested
+encoding.  The ``errors`` parameter is the same as the parameter of the
+``unicode()`` constructor, with one additional possibility; as well as 'strict',
+'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which uses XML's
+character references.  The following example shows the different results::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)
+    >>> u.encode('utf-8')
+    '\xea\x80\x80abcd\xde\xb4'
+    >>> u.encode('ascii')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
+    >>> u.encode('ascii', 'ignore')
+    'abcd'
+    >>> u.encode('ascii', 'replace')
+    '?abcd?'
+    >>> u.encode('ascii', 'xmlcharrefreplace')
+    '&#40960;abcd&#1972;'
+
+Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that
+interprets the string using the given encoding::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
+    >>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
+    >>> type(utf8_version), utf8_version
+    (<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
+    >>> u2 = utf8_version.decode('utf-8')            # Decode using UTF-8
+    >>> u == u2                                      # The two strings match
+    True
+ 
+The low-level routines for registering and accessing the available encodings are
+found in the :mod:`codecs` module.  However, the encoding and decoding functions
+returned by this module are usually more low-level than is comfortable, so I'm
+not going to describe the :mod:`codecs` module here.  If you need to implement a
+completely new encoding, you'll need to learn about the :mod:`codecs` module
+interfaces, but implementing encodings is a specialized task that also won't be
+covered here.  Consult the Python documentation to learn more about this module.
+
+The most commonly used part of the :mod:`codecs` module is the
+:func:`codecs.open` function which will be discussed in the section on input and
+output.
+            
+            
+Unicode Literals in Python Source Code
+--------------------------------------
+
+In Python source code, Unicode literals are written as strings prefixed with the
+'u' or 'U' character: ``u'abcdefghijk'``.  Specific code points can be written
+using the ``\u`` escape sequence, which is followed by four hex digits giving
+the code point.  The ``\U`` escape sequence is similar, but expects 8 hex
+digits, not 4.
+
+Unicode literals can also use the same escape sequences as 8-bit strings,
+including ``\x``, but ``\x`` only takes two hex digits so it can't express an
+arbitrary code point.  Octal escapes can go up to U+01ff, which is octal 777.
+
+::
+
+    >>> s = u"a\xac\u1234\u20ac\U00008000"
+               ^^^^ two-digit hex escape
+                   ^^^^^^ four-digit Unicode escape 
+                               ^^^^^^^^^^ eight-digit Unicode escape
+    >>> for c in s:  print ord(c),
+    ... 
+    97 172 4660 8364 32768
+
+Using escape sequences for code points greater than 127 is fine in small doses,
+but becomes an annoyance if you're using many accented characters, as you would
+in a program with messages in French or some other accent-using language.  You
+can also assemble strings using the :func:`unichr` built-in function, but this is
+even more tedious.
+
+Ideally, you'd want to be able to write literals in your language's natural
+encoding.  You could then edit Python source code with your favorite editor
+which would display the accented characters naturally, and have the right
+characters used at runtime.
+
+Python supports writing Unicode literals in any encoding, but you have to
+declare the encoding being used.  This is done by including a special comment as
+either the first or second line of the source file::
+
+    #!/usr/bin/env python
+    # -*- coding: latin-1 -*-
+    
+    u = u'abcdé'
+    print ord(u[-1])
+    
+The syntax is inspired by Emacs's notation for specifying variables local to a
+file.  Emacs supports many different variables, but Python only supports
+'coding'.  The ``-*-`` symbols indicate that the comment is special; within
+them, you must supply the name ``coding`` and the name of your chosen encoding,
+separated by ``':'``.
+
+If you don't include such a comment, the default encoding used will be ASCII.
+Versions of Python before 2.4 were Euro-centric and assumed Latin-1 as a default
+encoding for string literals; in Python 2.4, characters greater than 127 still
+work but result in a warning.  For example, the following program has no
+encoding declaration::
+
+    #!/usr/bin/env python
+    u = u'abcdé'
+    print ord(u[-1])
+
+When you run it with Python 2.4, it will output the following warning::
+
+    amk:~$ python p263.py
+    sys:1: DeprecationWarning: Non-ASCII character '\xe9' 
+         in file p263.py on line 2, but no encoding declared; 
+         see http://www.python.org/peps/pep-0263.html for details
+  
+
+Unicode Properties
+------------------
+
+The Unicode specification includes a database of information about code points.
+For each code point that's defined, the information includes the character's
+name, its category, the numeric value if applicable (Unicode has characters
+representing the Roman numerals and fractions such as one-third and
+four-fifths).  There are also properties related to the code point's use in
+bidirectional text and other display-related properties.
+
+The following program displays some information about several characters, and
+prints the numeric value of one particular character::
+
+    import unicodedata
+    
+    u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
+    
+    for i, c in enumerate(u):
+        print i, '%04x' % ord(c), unicodedata.category(c),
+        print unicodedata.name(c)
+    
+    # Get numeric value of second character
+    print unicodedata.numeric(u[1])
+
+When run, this prints::
+
+    0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
+    1 0bf2 No TAMIL NUMBER ONE THOUSAND
+    2 0f84 Mn TIBETAN MARK HALANTA
+    3 1770 Lo TAGBANWA LETTER SA
+    4 33af So SQUARE RAD OVER S SQUARED
+    1000.0
+
+The category codes are abbreviations describing the nature of the character.
+These are grouped into categories such as "Letter", "Number", "Punctuation", or
+"Symbol", which in turn are broken up into subcategories.  To take the codes
+from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means
+"Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
+other".  See
+<http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values> for a
+list of category codes.
+
+References
+----------
+
+The Unicode and 8-bit string types are described in the Python library reference
+at :ref:`typesseq`.
+
+The documentation for the :mod:`unicodedata` module.
+
+The documentation for the :mod:`codecs` module.
+
+Marc-André Lemburg gave a presentation at EuroPython 2002 titled "Python and
+Unicode".  A PDF version of his slides is available at
+<http://www.egenix.com/files/python/Unicode-EPC2002-Talk.pdf>, and is an
+excellent overview of the design of Python's Unicode features.
+
+
+Reading and Writing Unicode Data
+================================
+
+Once you've written some code that works with Unicode data, the next problem is
+input/output.  How do you get Unicode strings into your program, and how do you
+convert Unicode into a form suitable for storage or transmission?
+
+It's possible that you may not need to do anything depending on your input
+sources and output destinations; you should check whether the libraries used in
+your application support Unicode natively.  XML parsers often return Unicode
+data, for example.  Many relational databases also support Unicode-valued
+columns and can return Unicode values from an SQL query.
+
+Unicode data is usually converted to a particular encoding before it gets
+written to disk or sent over a socket.  It's possible to do all the work
+yourself: open a file, read an 8-bit string from it, and convert the string with
+``unicode(str, encoding)``.  However, the manual approach is not recommended.
+
+One problem is the multi-byte nature of encodings; one Unicode character can be
+represented by several bytes.  If you want to read the file in arbitrary-sized
+chunks (say, 1K or 4K), you need to write error-handling code to catch the case
+where only part of the bytes encoding a single Unicode character are read at the
+end of a chunk.  One solution would be to read the entire file into memory and
+then perform the decoding, but that prevents you from working with files that
+are extremely large; if you need to read a 2Gb file, you need 2Gb of RAM.
+(More, really, since for at least a moment you'd need to have both the encoded
+string and its Unicode version in memory.)
+
+The solution would be to use the low-level decoding interface to catch the case
+of partial coding sequences.  The work of implementing this has already been
+done for you: the :mod:`codecs` module includes a version of the :func:`open`
+function that returns a file-like object that assumes the file's contents are in
+a specified encoding and accepts Unicode parameters for methods such as
+``.read()`` and ``.write()``.
+
+The function's parameters are ``open(filename, mode='rb', encoding=None,
+errors='strict', buffering=1)``.  ``mode`` can be ``'r'``, ``'w'``, or ``'a'``,
+just like the corresponding parameter to the regular built-in ``open()``
+function; add a ``'+'`` to update the file.  ``buffering`` is similarly parallel
+to the standard function's parameter.  ``encoding`` is a string giving the
+encoding to use; if it's left as ``None``, a regular Python file object that
+accepts 8-bit strings is returned.  Otherwise, a wrapper object is returned, and
+data written to or read from the wrapper object will be converted as needed.
+``errors`` specifies the action for encoding errors and can be one of the usual
+values of 'strict', 'ignore', and 'replace'.
+
+Reading Unicode from a file is therefore simple::
+
+    import codecs
+    f = codecs.open('unicode.rst', encoding='utf-8')
+    for line in f:
+        print repr(line)
+
+It's also possible to open files in update mode, allowing both reading and
+writing::
+
+    f = codecs.open('test', encoding='utf-8', mode='w+')
+    f.write(u'\u4500 blah blah blah\n')
+    f.seek(0)
+    print repr(f.readline()[:1])
+    f.close()
+
+Unicode character U+FEFF is used as a byte-order mark (BOM), and is often
+written as the first character of a file in order to assist with autodetection
+of the file's byte ordering.  Some encodings, such as UTF-16, expect a BOM to be
+present at the start of a file; when such an encoding is used, the BOM will be
+automatically written as the first character and will be silently dropped when
+the file is read.  There are variants of these encodings, such as 'utf-16-le'
+and 'utf-16-be' for little-endian and big-endian encodings, that specify one
+particular byte ordering and don't skip the BOM.
+
+
+Unicode filenames
+-----------------
+
+Most of the operating systems in common use today support filenames that contain
+arbitrary Unicode characters.  Usually this is implemented by converting the
+Unicode string into some encoding that varies depending on the system.  For
+example, MacOS X uses UTF-8 while Windows uses a configurable encoding; on
+Windows, Python uses the name "mbcs" to refer to whatever the currently
+configured encoding is.  On Unix systems, there will only be a filesystem
+encoding if you've set the ``LANG`` or ``LC_CTYPE`` environment variables; if
+you haven't, the default encoding is ASCII.
+
+The :func:`sys.getfilesystemencoding` function returns the encoding to use on
+your current system, in case you want to do the encoding manually, but there's
+not much reason to bother.  When opening a file for reading or writing, you can
+usually just provide the Unicode string as the filename, and it will be
+automatically converted to the right encoding for you::
+
+    filename = u'filename\u4500abc'
+    f = open(filename, 'w')
+    f.write('blah\n')
+    f.close()
+
+Functions in the :mod:`os` module such as :func:`os.stat` will also accept Unicode
+filenames.
+
+:func:`os.listdir`, which returns filenames, raises an issue: should it return
+the Unicode version of filenames, or should it return 8-bit strings containing
+the encoded versions?  :func:`os.listdir` will do both, depending on whether you
+provided the directory path as an 8-bit string or a Unicode string.  If you pass
+a Unicode string as the path, filenames will be decoded using the filesystem's
+encoding and a list of Unicode strings will be returned, while passing an 8-bit
+path will return the 8-bit versions of the filenames.  For example, assuming the
+default filesystem encoding is UTF-8, running the following program::
+
+	fn = u'filename\u4500abc'
+	f = open(fn, 'w')
+	f.close()
+
+	import os
+	print os.listdir('.')
+	print os.listdir(u'.')
+
+will produce the following output::
+
+	amk:~$ python t.py
+	['.svn', 'filename\xe4\x94\x80abc', ...]
+	[u'.svn', u'filename\u4500abc', ...]
+
+The first list contains UTF-8-encoded filenames, and the second list contains
+the Unicode versions.
+
+
+	
+Tips for Writing Unicode-aware Programs
+---------------------------------------
+
+This section provides some suggestions on writing software that deals with
+Unicode.
+
+The most important tip is:
+
+    Software should only work with Unicode strings internally, converting to a
+    particular encoding on output.
+
+If you attempt to write processing functions that accept both Unicode and 8-bit
+strings, you will find your program vulnerable to bugs wherever you combine the
+two different kinds of strings.  Python's default encoding is ASCII, so whenever
+a character with an ASCII value > 127 is in the input data, you'll get a
+:exc:`UnicodeDecodeError` because that character can't be handled by the ASCII
+encoding.
+
+It's easy to miss such problems if you only test your software with data that
+doesn't contain any accents; everything will seem to work, but there's actually
+a bug in your program waiting for the first user who attempts to use characters
+> 127.  A second tip, therefore, is:
+
+    Include characters > 127 and, even better, characters > 255 in your test
+    data.
+
+When using data coming from a web browser or some other untrusted source, a
+common technique is to check for illegal characters in a string before using the
+string in a generated command line or storing it in a database.  If you're doing
+this, be careful to check the string once it's in the form that will be used or
+stored; it's possible for encodings to be used to disguise characters.  This is
+especially true if the input data also specifies the encoding; many encodings
+leave the commonly checked-for characters alone, but Python includes some
+encodings such as ``'base64'`` that modify every single character.
+
+For example, let's say you have a content management system that takes a Unicode
+filename, and you want to disallow paths with a '/' character.  You might write
+this code::
+
+    def read_file (filename, encoding):
+        if '/' in filename:
+            raise ValueError("'/' not allowed in filenames")
+        unicode_name = filename.decode(encoding)
+        f = open(unicode_name, 'r')
+        # ... return contents of file ...
+        
+However, if an attacker could specify the ``'base64'`` encoding, they could pass
+``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string
+``'/etc/passwd'``, to read a system file.  The above code looks for ``'/'``
+characters in the encoded form and misses the dangerous character in the
+resulting decoded form.
+
+References
+----------
+
+The PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
+Applications in Python" are available at
+<http://www.egenix.com/files/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>
+and discuss questions of character encodings as well as how to internationalize
+and localize an application.
+
+
+Revision History and Acknowledgements
+=====================================
+
+Thanks to the following people who have noted errors or offered suggestions on
+this article: Nicholas Bastin, Marius Gedminas, Kent Johnson, Ken Krugler,
+Marc-André Lemburg, Martin von Löwis, Chad Whitacre.
+
+Version 1.0: posted August 5 2005.
+
+Version 1.01: posted August 7 2005.  Corrects factual and markup errors; adds
+several links.
+
+Version 1.02: posted August 16 2005.  Corrects factual errors.
+
+
+.. comment Additional topic: building Python w/ UCS2 or UCS4 support
+.. comment Describe obscure -U switch somewhere?
+.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter
+
+.. comment 
+   Original outline:
+
+   - [ ] Unicode introduction
+       - [ ] ASCII
+       - [ ] Terms
+	   - [ ] Character
+	   - [ ] Code point
+	 - [ ] Encodings
+	    - [ ] Common encodings: ASCII, Latin-1, UTF-8
+       - [ ] Unicode Python type
+	   - [ ] Writing unicode literals
+	       - [ ] Obscurity: -U switch
+	   - [ ] Built-ins
+	       - [ ] unichr()
+	       - [ ] ord()
+	       - [ ] unicode() constructor
+	   - [ ] Unicode type
+	       - [ ] encode(), decode() methods
+       - [ ] Unicodedata module for character properties
+       - [ ] I/O
+	   - [ ] Reading/writing Unicode data into files
+	       - [ ] Byte-order marks
+	   - [ ] Unicode filenames
+       - [ ] Writing Unicode programs
+	   - [ ] Do everything in Unicode
+	   - [ ] Declaring source code encodings (PEP 263)
+       - [ ] Other issues
+	   - [ ] Building Python (UCS2, UCS4)

Added: doctools/trunk/Doc-26/howto/urllib2.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-26/howto/urllib2.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,573 @@
+************************************************
+  HOWTO Fetch Internet Resources Using urllib2
+************************************************
+
+:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
+
+.. note::
+
+    There is an French translation of an earlier revision of this
+    HOWTO, available at `urllib2 - Le Manuel manquant
+    <http://www.voidspace/python/articles/urllib2_francais.shtml>`_.
+
+ 
+
+Introduction
+============
+
+.. sidebar:: Related Articles
+
+    You may also find useful the following article on fetching web resources
+    with Python :
+    
+    * `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
+    
+        A tutorial on *Basic Authentication*, with examples in Python.
+
+**urllib2** is a `Python <http://www.python.org>`_ module for fetching URLs
+(Uniform Resource Locators). It offers a very simple interface, in the form of
+the *urlopen* function. This is capable of fetching URLs using a variety of
+different protocols. It also offers a slightly more complex interface for
+handling common situations - like basic authentication, cookies, proxies and so
+on. These are provided by objects called handlers and openers.
+
+urllib2 supports fetching URLs for many "URL schemes" (identified by the string
+before the ":" in URL - for example "ftp" is the URL scheme of
+"ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP).
+This tutorial focuses on the most common case, HTTP.
+
+For straightforward situations *urlopen* is very easy to use. But as soon as you
+encounter errors or non-trivial cases when opening HTTP URLs, you will need some
+understanding of the HyperText Transfer Protocol. The most comprehensive and
+authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
+not intended to be easy to read. This HOWTO aims to illustrate using *urllib2*,
+with enough detail about HTTP to help you through. It is not intended to replace
+the :mod:`urllib2` docs, but is supplementary to them.
+
+
+Fetching URLs
+=============
+
+The simplest way to use urllib2 is as follows::
+
+    import urllib2
+    response = urllib2.urlopen('http://python.org/')
+    html = response.read()
+
+Many uses of urllib2 will be that simple (note that instead of an 'http:' URL we
+could have used an URL starting with 'ftp:', 'file:', etc.).  However, it's the
+purpose of this tutorial to explain the more complicated cases, concentrating on
+HTTP.
+
+HTTP is based on requests and responses - the client makes requests and servers
+send responses. urllib2 mirrors this with a ``Request`` object which represents
+the HTTP request you are making. In its simplest form you create a Request
+object that specifies the URL you want to fetch. Calling ``urlopen`` with this
+Request object returns a response object for the URL requested. This response is
+a file-like object, which means you can for example call ``.read()`` on the
+response::
+
+    import urllib2
+
+    req = urllib2.Request('http://www.voidspace.org.uk')
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+Note that urllib2 makes use of the same Request interface to handle all URL
+schemes.  For example, you can make an FTP request like so::
+
+    req = urllib2.Request('ftp://example.com/')
+
+In the case of HTTP, there are two extra things that Request objects allow you
+to do: First, you can pass data to be sent to the server.  Second, you can pass
+extra information ("metadata") *about* the data or the about request itself, to
+the server - this information is sent as HTTP "headers".  Let's look at each of
+these in turn.
+
+Data
+----
+
+Sometimes you want to send data to a URL (often the URL will refer to a CGI
+(Common Gateway Interface) script [#]_ or other web application). With HTTP,
+this is often done using what's known as a **POST** request. This is often what
+your browser does when you submit a HTML form that you filled in on the web. Not
+all POSTs have to come from forms: you can use a POST to transmit arbitrary data
+to your own application. In the common case of HTML forms, the data needs to be
+encoded in a standard way, and then passed to the Request object as the ``data``
+argument. The encoding is done using a function from the ``urllib`` library
+*not* from ``urllib2``. ::
+
+    import urllib
+    import urllib2  
+
+    url = 'http://www.someserver.com/cgi-bin/register.cgi'
+    values = {'name' : 'Michael Foord',
+              'location' : 'Northampton',
+              'language' : 'Python' }
+
+    data = urllib.urlencode(values)
+    req = urllib2.Request(url, data)
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+Note that other encodings are sometimes required (e.g. for file upload from HTML
+forms - see `HTML Specification, Form Submission
+<http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
+details).
+
+If you do not pass the ``data`` argument, urllib2 uses a **GET** request. One
+way in which GET and POST requests differ is that POST requests often have
+"side-effects": they change the state of the system in some way (for example by
+placing an order with the website for a hundredweight of tinned spam to be
+delivered to your door).  Though the HTTP standard makes it clear that POSTs are
+intended to *always* cause side-effects, and GET requests *never* to cause
+side-effects, nothing prevents a GET request from having side-effects, nor a
+POST requests from having no side-effects. Data can also be passed in an HTTP
+GET request by encoding it in the URL itself.
+
+This is done as follows::
+
+    >>> import urllib2
+    >>> import urllib
+    >>> data = {}
+    >>> data['name'] = 'Somebody Here'
+    >>> data['location'] = 'Northampton'
+    >>> data['language'] = 'Python'
+    >>> url_values = urllib.urlencode(data)
+    >>> print url_values
+    name=Somebody+Here&language=Python&location=Northampton
+    >>> url = 'http://www.example.com/example.cgi'
+    >>> full_url = url + '?' + url_values
+    >>> data = urllib2.open(full_url)
+
+Notice that the full URL is created by adding a ``?`` to the URL, followed by
+the encoded values.
+
+Headers
+-------
+
+We'll discuss here one particular HTTP header, to illustrate how to add headers
+to your HTTP request.
+
+Some websites [#]_ dislike being browsed by programs, or send different versions
+to different browsers [#]_ . By default urllib2 identifies itself as
+``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
+numbers of the Python release,
+e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
+not work. The way a browser identifies itself is through the
+``User-Agent`` header [#]_. When you create a Request object you can
+pass a dictionary of headers in. The following example makes the same
+request as above, but identifies itself as a version of Internet
+Explorer [#]_. ::
+
+    import urllib
+    import urllib2  
+    
+    url = 'http://www.someserver.com/cgi-bin/register.cgi'
+    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
+    values = {'name' : 'Michael Foord',
+              'location' : 'Northampton',
+              'language' : 'Python' }
+    headers = { 'User-Agent' : user_agent }
+    
+    data = urllib.urlencode(values)
+    req = urllib2.Request(url, data, headers)
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+The response also has two useful methods. See the section on `info and geturl`_
+which comes after we have a look at what happens when things go wrong.
+
+
+Handling Exceptions
+===================
+
+*urlopen* raises ``URLError`` when it cannot handle a response (though as usual
+with Python APIs, builtin exceptions such as ValueError, TypeError etc. may also
+be raised).
+
+``HTTPError`` is the subclass of ``URLError`` raised in the specific case of
+HTTP URLs.
+
+URLError
+--------
+
+Often, URLError is raised because there is no network connection (no route to
+the specified server), or the specified server doesn't exist.  In this case, the
+exception raised will have a 'reason' attribute, which is a tuple containing an
+error code and a text error message.
+
+e.g. ::
+
+    >>> req = urllib2.Request('http://www.pretend_server.org')
+    >>> try: urllib2.urlopen(req)
+    >>> except URLError, e:
+    >>>    print e.reason
+    >>>
+    (4, 'getaddrinfo failed')
+
+
+HTTPError
+---------
+
+Every HTTP response from the server contains a numeric "status code". Sometimes
+the status code indicates that the server is unable to fulfil the request. The
+default handlers will handle some of these responses for you (for example, if
+the response is a "redirection" that requests the client fetch the document from
+a different URL, urllib2 will handle that for you). For those it can't handle,
+urlopen will raise an ``HTTPError``. Typical errors include '404' (page not
+found), '403' (request forbidden), and '401' (authentication required).
+
+See section 10 of RFC 2616 for a reference on all the HTTP error codes.
+
+The ``HTTPError`` instance raised will have an integer 'code' attribute, which
+corresponds to the error sent by the server.
+
+Error Codes
+~~~~~~~~~~~
+
+Because the default handlers handle redirects (codes in the 300 range), and
+codes in the 100-299 range indicate success, you will usually only see error
+codes in the 400-599 range.
+
+``BaseHTTPServer.BaseHTTPRequestHandler.responses`` is a useful dictionary of
+response codes in that shows all the response codes used by RFC 2616. The
+dictionary is reproduced here for convenience ::
+
+    # Table mapping response codes to messages; entries have the
+    # form {code: (shortmessage, longmessage)}.
+    responses = {
+        100: ('Continue', 'Request received, please continue'),
+        101: ('Switching Protocols',
+              'Switching to new protocol; obey Upgrade header'),
+
+        200: ('OK', 'Request fulfilled, document follows'),
+        201: ('Created', 'Document created, URL follows'),
+        202: ('Accepted',
+              'Request accepted, processing continues off-line'),
+        203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
+        204: ('No Content', 'Request fulfilled, nothing follows'),
+        205: ('Reset Content', 'Clear input form for further input.'),
+        206: ('Partial Content', 'Partial content follows.'),
+
+        300: ('Multiple Choices',
+              'Object has several resources -- see URI list'),
+        301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
+        302: ('Found', 'Object moved temporarily -- see URI list'),
+        303: ('See Other', 'Object moved -- see Method and URL list'),
+        304: ('Not Modified',
+              'Document has not changed since given time'),
+        305: ('Use Proxy',
+              'You must use proxy specified in Location to access this '
+              'resource.'),
+        307: ('Temporary Redirect',
+              'Object moved temporarily -- see URI list'),
+
+        400: ('Bad Request',
+              'Bad request syntax or unsupported method'),
+        401: ('Unauthorized',
+              'No permission -- see authorization schemes'),
+        402: ('Payment Required',
+              'No payment -- see charging schemes'),
+        403: ('Forbidden',
+              'Request forbidden -- authorization will not help'),
+        404: ('Not Found', 'Nothing matches the given URI'),
+        405: ('Method Not Allowed',
+              'Specified method is invalid for this server.'),
+        406: ('Not Acceptable', 'URI not available in preferred format.'),
+        407: ('Proxy Authentication Required', 'You must authenticate with '
+              'this proxy before proceeding.'),
+        408: ('Request Timeout', 'Request timed out; try again later.'),
+        409: ('Conflict', 'Request conflict.'),
+        410: ('Gone',
+              'URI no longer exists and has been permanently removed.'),
+        411: ('Length Required', 'Client must specify Content-Length.'),
+        412: ('Precondition Failed', 'Precondition in headers is false.'),
+        413: ('Request Entity Too Large', 'Entity is too large.'),
+        414: ('Request-URI Too Long', 'URI is too long.'),
+        415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
+        416: ('Requested Range Not Satisfiable',
+              'Cannot satisfy request range.'),
+        417: ('Expectation Failed',
+              'Expect condition could not be satisfied.'),
+
+        500: ('Internal Server Error', 'Server got itself in trouble'),
+        501: ('Not Implemented',
+              'Server does not support this operation'),
+        502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
+        503: ('Service Unavailable',
+              'The server cannot process the request due to a high load'),
+        504: ('Gateway Timeout',
+              'The gateway server did not receive a timely response'),
+        505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
+        }
+
+When an error is raised the server responds by returning an HTTP error code
+*and* an error page. You can use the ``HTTPError`` instance as a response on the
+page returned. This means that as well as the code attribute, it also has read,
+geturl, and info, methods. ::
+
+    >>> req = urllib2.Request('http://www.python.org/fish.html')
+    >>> try: 
+    >>>     urllib2.urlopen(req)
+    >>> except URLError, e:
+    >>>     print e.code
+    >>>     print e.read()
+    >>> 
+    404
+    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
+        "http://www.w3.org/TR/html4/loose.dtd">
+    <?xml-stylesheet href="./css/ht2html.css" 
+        type="text/css"?>
+    <html><head><title>Error 404: File Not Found</title> 
+    ...... etc...
+
+Wrapping it Up
+--------------
+
+So if you want to be prepared for ``HTTPError`` *or* ``URLError`` there are two
+basic approaches. I prefer the second approach.
+
+Number 1
+~~~~~~~~
+
+::
+
+
+    from urllib2 import Request, urlopen, URLError, HTTPError
+    req = Request(someurl)
+    try:
+        response = urlopen(req)
+    except HTTPError, e:
+        print 'The server couldn\'t fulfill the request.'
+        print 'Error code: ', e.code
+    except URLError, e:
+        print 'We failed to reach a server.'
+        print 'Reason: ', e.reason
+    else:
+        # everything is fine
+
+
+.. note::
+
+    The ``except HTTPError`` *must* come first, otherwise ``except URLError``
+    will *also* catch an ``HTTPError``.
+
+Number 2
+~~~~~~~~
+
+::
+
+    from urllib2 import Request, urlopen, URLError
+    req = Request(someurl)
+    try:
+        response = urlopen(req)
+    except URLError, e:
+        if hasattr(e, 'reason'):
+            print 'We failed to reach a server.'
+            print 'Reason: ', e.reason
+        elif hasattr(e, 'code'):
+            print 'The server couldn\'t fulfill the request.'
+            print 'Error code: ', e.code
+    else:
+        # everything is fine
+        
+
+info and geturl
+===============
+
+The response returned by urlopen (or the ``HTTPError`` instance) has two useful
+methods ``info`` and ``geturl``.
+
+**geturl** - this returns the real URL of the page fetched. This is useful
+because ``urlopen`` (or the opener object used) may have followed a
+redirect. The URL of the page fetched may not be the same as the URL requested.
+
+**info** - this returns a dictionary-like object that describes the page
+fetched, particularly the headers sent by the server. It is currently an
+``httplib.HTTPMessage`` instance.
+
+Typical headers include 'Content-length', 'Content-type', and so on. See the
+`Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_
+for a useful listing of HTTP headers with brief explanations of their meaning
+and use.
+
+
+Openers and Handlers
+====================
+
+When you fetch a URL you use an opener (an instance of the perhaps
+confusingly-named :class:`urllib2.OpenerDirector`). Normally we have been using
+the default opener - via ``urlopen`` - but you can create custom
+openers. Openers use handlers. All the "heavy lifting" is done by the
+handlers. Each handler knows how to open URLs for a particular URL scheme (http,
+ftp, etc.), or how to handle an aspect of URL opening, for example HTTP
+redirections or HTTP cookies.
+
+You will want to create openers if you want to fetch URLs with specific handlers
+installed, for example to get an opener that handles cookies, or to get an
+opener that does not handle redirections.
+
+To create an opener, instantiate an ``OpenerDirector``, and then call
+``.add_handler(some_handler_instance)`` repeatedly.
+
+Alternatively, you can use ``build_opener``, which is a convenience function for
+creating opener objects with a single function call.  ``build_opener`` adds
+several handlers by default, but provides a quick way to add more and/or
+override the default handlers.
+
+Other sorts of handlers you might want to can handle proxies, authentication,
+and other common but slightly specialised situations.
+
+``install_opener`` can be used to make an ``opener`` object the (global) default
+opener. This means that calls to ``urlopen`` will use the opener you have
+installed.
+
+Opener objects have an ``open`` method, which can be called directly to fetch
+urls in the same way as the ``urlopen`` function: there's no need to call
+``install_opener``, except as a convenience.
+
+
+Basic Authentication
+====================
+
+To illustrate creating and installing a handler we will use the
+``HTTPBasicAuthHandler``. For a more detailed discussion of this subject -
+including an explanation of how Basic Authentication works - see the `Basic
+Authentication Tutorial
+<http://www.voidspace.org.uk/python/articles/authentication.shtml>`_.
+
+When authentication is required, the server sends a header (as well as the 401
+error code) requesting authentication.  This specifies the authentication scheme
+and a 'realm'. The header looks like : ``Www-authenticate: SCHEME
+realm="REALM"``.
+
+e.g. :: 
+
+    Www-authenticate: Basic realm="cPanel Users"
+
+
+The client should then retry the request with the appropriate name and password
+for the realm included as a header in the request. This is 'basic
+authentication'. In order to simplify this process we can create an instance of
+``HTTPBasicAuthHandler`` and an opener to use this handler.
+
+The ``HTTPBasicAuthHandler`` uses an object called a password manager to handle
+the mapping of URLs and realms to passwords and usernames. If you know what the
+realm is (from the authentication header sent by the server), then you can use a
+``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In that
+case, it is convenient to use ``HTTPPasswordMgrWithDefaultRealm``. This allows
+you to specify a default username and password for a URL. This will be supplied
+in the absence of you providing an alternative combination for a specific
+realm. We indicate this by providing ``None`` as the realm argument to the
+``add_password`` method.
+
+The top-level URL is the first URL that requires authentication. URLs "deeper"
+than the URL you pass to .add_password() will also match. ::
+
+    # create a password manager
+    password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()                        
+
+    # Add the username and password.
+    # If we knew the realm, we could use it instead of ``None``.
+    top_level_url = "http://example.com/foo/"
+    password_mgr.add_password(None, top_level_url, username, password)
+
+    handler = urllib2.HTTPBasicAuthHandler(password_mgr)                            
+
+    # create "opener" (OpenerDirector instance)
+    opener = urllib2.build_opener(handler)                       
+
+    # use the opener to fetch a URL
+    opener.open(a_url)      
+
+    # Install the opener.
+    # Now all calls to urllib2.urlopen use our opener.
+    urllib2.install_opener(opener)                               
+
+.. note::
+
+    In the above example we only supplied our ``HHTPBasicAuthHandler`` to
+    ``build_opener``. By default openers have the handlers for normal situations
+    -- ``ProxyHandler``, ``UnknownHandler``, ``HTTPHandler``,
+    ``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``,
+    ``FileHandler``, ``HTTPErrorProcessor``.
+
+``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme
+component and the hostname and optionally the port number)
+e.g. "http://example.com/" *or* an "authority" (i.e. the hostname,
+optionally including the port number) e.g. "example.com" or "example.com:8080"
+(the latter example includes a port number).  The authority, if present, must
+NOT contain the "userinfo" component - for example "joe at password:example.com" is
+not correct.
+
+
+Proxies
+=======
+
+**urllib2** will auto-detect your proxy settings and use those. This is through
+the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
+a good thing, but there are occasions when it may not be helpful [#]_. One way
+to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
+is done using similar steps to setting up a `Basic Authentication`_ handler : ::
+
+    >>> proxy_support = urllib2.ProxyHandler({})
+    >>> opener = urllib2.build_opener(proxy_support)
+    >>> urllib2.install_opener(opener)
+
+.. note::
+
+    Currently ``urllib2`` *does not* support fetching of ``https`` locations
+    through a proxy. This can be a problem.
+
+Sockets and Layers
+==================
+
+The Python support for fetching resources from the web is layered. urllib2 uses
+the httplib library, which in turn uses the socket library.
+
+As of Python 2.3 you can specify how long a socket should wait for a response
+before timing out. This can be useful in applications which have to fetch web
+pages. By default the socket module has *no timeout* and can hang. Currently,
+the socket timeout is not exposed at the httplib or urllib2 levels.  However,
+you can set the default timeout globally for all sockets using ::
+
+    import socket
+    import urllib2
+
+    # timeout in seconds
+    timeout = 10
+    socket.setdefaulttimeout(timeout) 
+
+    # this call to urllib2.urlopen now uses the default timeout
+    # we have set in the socket module
+    req = urllib2.Request('http://www.voidspace.org.uk')
+    response = urllib2.urlopen(req)
+
+
+-------
+
+
+Footnotes
+=========
+
+This document was reviewed and revised by John Lee.
+
+.. [#] For an introduction to the CGI protocol see
+       `Writing Web Applications in Python <http://www.pyzine.com/Issue008/Section_Articles/article_CGIOne.html>`_. 
+.. [#] Like Google for example. The *proper* way to use google from a program
+       is to use `PyGoogle <http://pygoogle.sourceforge.net>`_ of course. See
+       `Voidspace Google <http://www.voidspace.org.uk/python/recipebook.shtml#google>`_
+       for some examples of using the Google API.
+.. [#] Browser sniffing is a very bad practise for website design - building
+       sites using web standards is much more sensible. Unfortunately a lot of
+       sites still send different versions to different browsers.
+.. [#] The user agent for MSIE 6 is
+       *'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'*
+.. [#] For details of more HTTP request headers, see
+       `Quick Reference to HTTP Headers`_.
+.. [#] In my case I have to use a proxy to access the internet at work. If you
+       attempt to fetch *localhost* URLs through this proxy it blocks them. IE
+       is set to use the proxy, which urllib2 picks up on. In order to test
+       scripts with a localhost server, I have to prevent urllib2 from using
+       the proxy.

Modified: doctools/trunk/Doc-3k/ACKS
==============================================================================
--- doctools/trunk/Doc-3k/ACKS	(original)
+++ doctools/trunk/Doc-3k/ACKS	Thu Aug  2 23:31:34 2007
@@ -15,7 +15,7 @@
 * Steve Alexander
 * Jim Ahlstrom
 * Fred Allen
-* A. Amoroso
+* \A. Amoroso
 * Pehr Anderson
 * Oliver Andrich
 * Jesús Cea Avión
@@ -40,7 +40,7 @@
 * Jeremy Craven
 * Andrew Dalke
 * Ben Darnell
-* L. Peter Deutsch
+* \L. Peter Deutsch
 * Robert Donohue
 * Fred L. Drake, Jr.
 * Jeff Epler

Modified: doctools/trunk/Doc-3k/TODO
==============================================================================
--- doctools/trunk/Doc-3k/TODO	(original)
+++ doctools/trunk/Doc-3k/TODO	Thu Aug  2 23:31:34 2007
@@ -2,7 +2,6 @@
 ======================
 
 * split very large files and add toctrees
-* integrate standalone HOWTOs
 * find out which files get "comments disabled" metadata
 * write "About these documents"
 * finish "Documenting Python"

Modified: doctools/trunk/Doc-3k/contents.rst
==============================================================================
--- doctools/trunk/Doc-3k/contents.rst	(original)
+++ doctools/trunk/Doc-3k/contents.rst	Thu Aug  2 23:31:34 2007
@@ -14,6 +14,7 @@
    distutils/index.rst
    install/index.rst
    documenting/index.rst
+   howto/index.rst
 
    bugs.rst
    about.rst

Added: doctools/trunk/Doc-3k/howto/advocacy.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/advocacy.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,356 @@
+*************************
+  Python Advocacy HOWTO
+*************************
+
+:Author: A.M. Kuchling
+:Release: 0.03
+
+
+.. topic:: Abstract
+
+   It's usually difficult to get your management to accept open source software,
+   and Python is no exception to this rule.  This document discusses reasons to use
+   Python, strategies for winning acceptance, facts and arguments you can use, and
+   cases where you *shouldn't* try to use Python.
+
+
+Reasons to Use Python
+=====================
+
+There are several reasons to incorporate a scripting language into your
+development process, and this section will discuss them, and why Python has some
+properties that make it a particularly good choice.
+
+
+Programmability
+---------------
+
+Programs are often organized in a modular fashion.  Lower-level operations are
+grouped together, and called by higher-level functions, which may in turn be
+used as basic operations by still further upper levels.
+
+For example, the lowest level might define a very low-level set of functions for
+accessing a hash table.  The next level might use hash tables to store the
+headers of a mail message, mapping a header name like ``Date`` to a value such
+as ``Tue, 13 May 1997 20:00:54 -0400``.  A yet higher level may operate on
+message objects, without knowing or caring that message headers are stored in a
+hash table, and so forth.
+
+Often, the lowest levels do very simple things; they implement a data structure
+such as a binary tree or hash table, or they perform some simple computation,
+such as converting a date string to a number.  The higher levels then contain
+logic connecting these primitive operations.  Using the approach, the primitives
+can be seen as basic building blocks which are then glued together to produce
+the complete product.
+
+Why is this design approach relevant to Python?  Because Python is well suited
+to functioning as such a glue language.  A common approach is to write a Python
+module that implements the lower level operations; for the sake of speed, the
+implementation might be in C, Java, or even Fortran.  Once the primitives are
+available to Python programs, the logic underlying higher level operations is
+written in the form of Python code.  The high-level logic is then more
+understandable, and easier to modify.
+
+John Ousterhout wrote a paper that explains this idea at greater length,
+entitled "Scripting: Higher Level Programming for the 21st Century".  I
+recommend that you read this paper; see the references for the URL.  Ousterhout
+is the inventor of the Tcl language, and therefore argues that Tcl should be
+used for this purpose; he only briefly refers to other languages such as Python,
+Perl, and Lisp/Scheme, but in reality, Ousterhout's argument applies to
+scripting languages in general, since you could equally write extensions for any
+of the languages mentioned above.
+
+
+Prototyping
+-----------
+
+In *The Mythical Man-Month*, Fredrick Brooks suggests the following rule when
+planning software projects: "Plan to throw one away; you will anyway."  Brooks
+is saying that the first attempt at a software design often turns out to be
+wrong; unless the problem is very simple or you're an extremely good designer,
+you'll find that new requirements and features become apparent once development
+has actually started.  If these new requirements can't be cleanly incorporated
+into the program's structure, you're presented with two unpleasant choices:
+hammer the new features into the program somehow, or scrap everything and write
+a new version of the program, taking the new features into account from the
+beginning.
+
+Python provides you with a good environment for quickly developing an initial
+prototype.  That lets you get the overall program structure and logic right, and
+you can fine-tune small details in the fast development cycle that Python
+provides.  Once you're satisfied with the GUI interface or program output, you
+can translate the Python code into C++, Fortran, Java, or some other compiled
+language.
+
+Prototyping means you have to be careful not to use too many Python features
+that are hard to implement in your other language.  Using ``eval()``, or regular
+expressions, or the :mod:`pickle` module, means that you're going to need C or
+Java libraries for formula evaluation, regular expressions, and serialization,
+for example.  But it's not hard to avoid such tricky code, and in the end the
+translation usually isn't very difficult.  The resulting code can be rapidly
+debugged, because any serious logical errors will have been removed from the
+prototype, leaving only more minor slip-ups in the translation to track down.
+
+This strategy builds on the earlier discussion of programmability. Using Python
+as glue to connect lower-level components has obvious relevance for constructing
+prototype systems.  In this way Python can help you with development, even if
+end users never come in contact with Python code at all.  If the performance of
+the Python version is adequate and corporate politics allow it, you may not need
+to do a translation into C or Java, but it can still be faster to develop a
+prototype and then translate it, instead of attempting to produce the final
+version immediately.
+
+One example of this development strategy is Microsoft Merchant Server. Version
+1.0 was written in pure Python, by a company that subsequently was purchased by
+Microsoft.  Version 2.0 began to translate the code into C++, shipping with some
+C++code and some Python code.  Version 3.0 didn't contain any Python at all; all
+the code had been translated into C++.  Even though the product doesn't contain
+a Python interpreter, the Python language has still served a useful purpose by
+speeding up development.
+
+This is a very common use for Python.  Past conference papers have also
+described this approach for developing high-level numerical algorithms; see
+David M. Beazley and Peter S. Lomdahl's paper "Feeding a Large-scale Physics
+Application to Python" in the references for a good example.  If an algorithm's
+basic operations are things like "Take the inverse of this 4000x4000 matrix",
+and are implemented in some lower-level language, then Python has almost no
+additional performance cost; the extra time required for Python to evaluate an
+expression like ``m.invert()`` is dwarfed by the cost of the actual computation.
+It's particularly good for applications where seemingly endless tweaking is
+required to get things right. GUI interfaces and Web sites are prime examples.
+
+The Python code is also shorter and faster to write (once you're familiar with
+Python), so it's easier to throw it away if you decide your approach was wrong;
+if you'd spent two weeks working on it instead of just two hours, you might
+waste time trying to patch up what you've got out of a natural reluctance to
+admit that those two weeks were wasted.  Truthfully, those two weeks haven't
+been wasted, since you've learnt something about the problem and the technology
+you're using to solve it, but it's human nature to view this as a failure of
+some sort.
+
+
+Simplicity and Ease of Understanding
+------------------------------------
+
+Python is definitely *not* a toy language that's only usable for small tasks.
+The language features are general and powerful enough to enable it to be used
+for many different purposes.  It's useful at the small end, for 10- or 20-line
+scripts, but it also scales up to larger systems that contain thousands of lines
+of code.
+
+However, this expressiveness doesn't come at the cost of an obscure or tricky
+syntax.  While Python has some dark corners that can lead to obscure code, there
+are relatively few such corners, and proper design can isolate their use to only
+a few classes or modules.  It's certainly possible to write confusing code by
+using too many features with too little concern for clarity, but most Python
+code can look a lot like a slightly-formalized version of human-understandable
+pseudocode.
+
+In *The New Hacker's Dictionary*, Eric S. Raymond gives the following definition
+for "compact":
+
+.. epigraph::
+
+   Compact *adj.*  Of a design, describes the valuable property that it can all be
+   apprehended at once in one's head. This generally means the thing created from
+   the design can be used with greater facility and fewer errors than an equivalent
+   tool that is not compact. Compactness does not imply triviality or lack of
+   power; for example, C is compact and FORTRAN is not, but C is more powerful than
+   FORTRAN. Designs become non-compact through accreting features and cruft that
+   don't merge cleanly into the overall design scheme (thus, some fans of Classic C
+   maintain that ANSI C is no longer compact).
+
+   (From http://www.catb.org/ esr/jargon/html/C/compact.html)
+
+In this sense of the word, Python is quite compact, because the language has
+just a few ideas, which are used in lots of places.  Take namespaces, for
+example.  Import a module with ``import math``, and you create a new namespace
+called ``math``.  Classes are also namespaces that share many of the properties
+of modules, and have a few of their own; for example, you can create instances
+of a class. Instances?  They're yet another namespace.  Namespaces are currently
+implemented as Python dictionaries, so they have the same methods as the
+standard dictionary data type: .keys() returns all the keys, and so forth.
+
+This simplicity arises from Python's development history.  The language syntax
+derives from different sources; ABC, a relatively obscure teaching language, is
+one primary influence, and Modula-3 is another.  (For more information about ABC
+and Modula-3, consult their respective Web sites at http://www.cwi.nl/
+steven/abc/ and http://www.m3.org.)  Other features have come from C, Icon,
+Algol-68, and even Perl.  Python hasn't really innovated very much, but instead
+has tried to keep the language small and easy to learn, building on ideas that
+have been tried in other languages and found useful.
+
+Simplicity is a virtue that should not be underestimated.  It lets you learn the
+language more quickly, and then rapidly write code, code that often works the
+first time you run it.
+
+
+Java Integration
+----------------
+
+If you're working with Java, Jython (http://www.jython.org/) is definitely worth
+your attention.  Jython is a re-implementation of Python in Java that compiles
+Python code into Java bytecodes.  The resulting environment has very tight,
+almost seamless, integration with Java.  It's trivial to access Java classes
+from Python, and you can write Python classes that subclass Java classes.
+Jython can be used for prototyping Java applications in much the same way
+CPython is used, and it can also be used for test suites for Java code, or
+embedded in a Java application to add scripting capabilities.
+
+
+Arguments and Rebuttals
+=======================
+
+Let's say that you've decided upon Python as the best choice for your
+application.  How can you convince your management, or your fellow developers,
+to use Python?  This section lists some common arguments against using Python,
+and provides some possible rebuttals.
+
+**Python is freely available software that doesn't cost anything. How good can
+it be?**
+
+Very good, indeed.  These days Linux and Apache, two other pieces of open source
+software, are becoming more respected as alternatives to commercial software,
+but Python hasn't had all the publicity.
+
+Python has been around for several years, with many users and developers.
+Accordingly, the interpreter has been used by many people, and has gotten most
+of the bugs shaken out of it.  While bugs are still discovered at intervals,
+they're usually either quite obscure (they'd have to be, for no one to have run
+into them before) or they involve interfaces to external libraries.  The
+internals of the language itself are quite stable.
+
+Having the source code should be viewed as making the software available for
+peer review; people can examine the code, suggest (and implement) improvements,
+and track down bugs.  To find out more about the idea of open source code, along
+with arguments and case studies supporting it, go to http://www.opensource.org.
+
+**Who's going to support it?**
+
+Python has a sizable community of developers, and the number is still growing.
+The Internet community surrounding the language is an active one, and is worth
+being considered another one of Python's advantages. Most questions posted to
+the comp.lang.python newsgroup are quickly answered by someone.
+
+Should you need to dig into the source code, you'll find it's clear and well-
+organized, so it's not very difficult to write extensions and track down bugs
+yourself.  If you'd prefer to pay for support, there are companies and
+individuals who offer commercial support for Python.
+
+**Who uses Python for serious work?**
+
+Lots of people; one interesting thing about Python is the surprising diversity
+of applications that it's been used for.  People are using Python to:
+
+* Run Web sites
+
+* Write GUI interfaces
+
+* Control number-crunching code on supercomputers
+
+* Make a commercial application scriptable by embedding the Python interpreter
+  inside it
+
+* Process large XML data sets
+
+* Build test suites for C or Java code
+
+Whatever your application domain is, there's probably someone who's used Python
+for something similar.  Yet, despite being useable for such high-end
+applications, Python's still simple enough to use for little jobs.
+
+See http://wiki.python.org/moin/OrganizationsUsingPython for a list of some of
+the  organizations that use Python.
+
+**What are the restrictions on Python's use?**
+
+They're practically nonexistent.  Consult the :file:`Misc/COPYRIGHT` file in the
+source distribution, or http://www.python.org/doc/Copyright.html for the full
+language, but it boils down to three conditions.
+
+* You have to leave the copyright notice on the software; if you don't include
+  the source code in a product, you have to put the copyright notice in the
+  supporting documentation.
+
+* Don't claim that the institutions that have developed Python endorse your
+  product in any way.
+
+* If something goes wrong, you can't sue for damages.  Practically all software
+  licences contain this condition.
+
+Notice that you don't have to provide source code for anything that contains
+Python or is built with it.  Also, the Python interpreter and accompanying
+documentation can be modified and redistributed in any way you like, and you
+don't have to pay anyone any licensing fees at all.
+
+**Why should we use an obscure language like Python instead of well-known
+language X?**
+
+I hope this HOWTO, and the documents listed in the final section, will help
+convince you that Python isn't obscure, and has a healthily growing user base.
+One word of advice: always present Python's positive advantages, instead of
+concentrating on language X's failings.  People want to know why a solution is
+good, rather than why all the other solutions are bad.  So instead of attacking
+a competing solution on various grounds, simply show how Python's virtues can
+help.
+
+
+Useful Resources
+================
+
+http://www.pythonology.com/success
+   The Python Success Stories are a collection of stories from successful users of
+   Python, with the emphasis on business and corporate users.
+
+.. % \term{\url{http://www.fsbassociates.com/books/pythonchpt1.htm}}
+.. % The first chapter of \emph{Internet Programming with Python} also
+.. % examines some of the reasons for using Python.  The book is well worth
+.. % buying, but the publishers have made the first chapter available on
+.. % the Web.
+
+http://home.pacbell.net/ouster/scripting.html
+   John Ousterhout's white paper on scripting is a good argument for the utility of
+   scripting languages, though naturally enough, he emphasizes Tcl, the language he
+   developed.  Most of the arguments would apply to any scripting language.
+
+http://www.python.org/workshops/1997-10/proceedings/beazley.html
+   The authors, David M. Beazley and Peter S. Lomdahl,  describe their use of
+   Python at Los Alamos National Laboratory. It's another good example of how
+   Python can help get real work done. This quotation from the paper has been
+   echoed by many people:
+
+   .. epigraph::
+
+      Originally developed as a large monolithic application for massively parallel
+      processing systems, we have used Python to transform our application into a
+      flexible, highly modular, and extremely powerful system for performing
+      simulation, data analysis, and visualization. In addition, we describe how
+      Python has solved a number of important problems related to the development,
+      debugging, deployment, and maintenance of scientific software.
+
+http://pythonjournal.cognizor.com/pyj1/Everitt-Feit_interview98-V1.html
+   This interview with Andy Feit, discussing Infoseek's use of Python, can be used
+   to show that choosing Python didn't introduce any difficulties into a company's
+   development process, and provided some substantial benefits.
+
+.. % \term{\url{http://www.python.org/psa/Commercial.html}}
+.. % Robin Friedrich wrote this document on how to support Python's use in
+.. % commercial projects.
+
+http://www.python.org/workshops/1997-10/proceedings/stein.ps
+   For the 6th Python conference, Greg Stein presented a paper that traced Python's
+   adoption and usage at a startup called eShop, and later at Microsoft.
+
+http://www.opensource.org
+   Management may be doubtful of the reliability and usefulness of software that
+   wasn't written commercially.  This site presents arguments that show how open
+   source software can have considerable advantages over closed-source software.
+
+http://sunsite.unc.edu/LDP/HOWTO/mini/Advocacy.html
+   The Linux Advocacy mini-HOWTO was the inspiration for this document, and is also
+   well worth reading for general suggestions on winning acceptance for a new
+   technology, such as Linux or Python.  In general, you won't make much progress
+   by simply attacking existing systems and complaining about their inadequacies;
+   this often ends up looking like unfocused whining.  It's much better to point
+   out some of the many areas where Python is an improvement over other systems.
+

Added: doctools/trunk/Doc-3k/howto/curses.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/curses.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,433 @@
+**********************************
+  Curses Programming with Python
+**********************************
+
+:Author: A.M. Kuchling, Eric S. Raymond
+:Release: 2.02
+
+
+.. topic:: Abstract
+
+   This document describes how to write text-mode programs with Python 2.x, using
+   the :mod:`curses` extension module to control the display.
+
+
+What is curses?
+===============
+
+The curses library supplies a terminal-independent screen-painting and keyboard-
+handling facility for text-based terminals; such terminals include VT100s, the
+Linux console, and the simulated terminal provided by X11 programs such as xterm
+and rxvt.  Display terminals support various control codes to perform common
+operations such as moving the cursor, scrolling the screen, and erasing areas.
+Different terminals use widely differing codes, and often have their own minor
+quirks.
+
+In a world of X displays, one might ask "why bother"?  It's true that character-
+cell display terminals are an obsolete technology, but there are niches in which
+being able to do fancy things with them are still valuable.  One is on small-
+footprint or embedded Unixes that  don't carry an X server.  Another is for
+tools like OS installers and kernel configurators that may have to run before X
+is available.
+
+The curses library hides all the details of different terminals, and provides
+the programmer with an abstraction of a display, containing multiple non-
+overlapping windows.  The contents of a window can be changed in various ways--
+adding text, erasing it, changing its appearance--and the curses library will
+automagically figure out what control codes need to be sent to the terminal to
+produce the right output.
+
+The curses library was originally written for BSD Unix; the later System V
+versions of Unix from AT&T added many enhancements and new functions. BSD curses
+is no longer maintained, having been replaced by ncurses, which is an open-
+source implementation of the AT&T interface.  If you're using an open-source
+Unix such as Linux or FreeBSD, your system almost certainly uses ncurses.  Since
+most current commercial Unix versions are based on System V code, all the
+functions described here will probably be available.  The older versions of
+curses carried by some proprietary Unixes may not support everything, though.
+
+No one has made a Windows port of the curses module.  On a Windows platform, try
+the Console module written by Fredrik Lundh.  The Console module provides
+cursor-addressable text output, plus full support for mouse and keyboard input,
+and is available from http://effbot.org/efflib/console.
+
+
+The Python curses module
+------------------------
+
+Thy Python module is a fairly simple wrapper over the C functions provided by
+curses; if you're already familiar with curses programming in C, it's really
+easy to transfer that knowledge to Python.  The biggest difference is that the
+Python interface makes things simpler, by merging different C functions such as
+:func:`addstr`, :func:`mvaddstr`, :func:`mvwaddstr`, into a single
+:meth:`addstr` method.  You'll see this covered in more detail later.
+
+This HOWTO is simply an introduction to writing text-mode programs with curses
+and Python. It doesn't attempt to be a complete guide to the curses API; for
+that, see the Python library guide's section on ncurses, and the C manual pages
+for ncurses.  It will, however, give you the basic ideas.
+
+
+Starting and ending a curses application
+========================================
+
+Before doing anything, curses must be initialized.  This is done by calling the
+:func:`initscr` function, which will determine the terminal type, send any
+required setup codes to the terminal, and create various internal data
+structures.  If successful, :func:`initscr` returns a window object representing
+the entire screen; this is usually called ``stdscr``, after the name of the
+corresponding C variable. ::
+
+   import curses
+   stdscr = curses.initscr()
+
+Usually curses applications turn off automatic echoing of keys to the screen, in
+order to be able to read keys and only display them under certain circumstances.
+This requires calling the :func:`noecho` function. ::
+
+   curses.noecho()
+
+Applications will also commonly need to react to keys instantly, without
+requiring the Enter key to be pressed; this is called cbreak mode, as opposed to
+the usual buffered input mode. ::
+
+   curses.cbreak()
+
+Terminals usually return special keys, such as the cursor keys or navigation
+keys such as Page Up and Home, as a multibyte escape sequence.  While you could
+write your application to expect such sequences and process them accordingly,
+curses can do it for you, returning a special value such as
+:const:`curses.KEY_LEFT`.  To get curses to do the job, you'll have to enable
+keypad mode. ::
+
+   stdscr.keypad(1)
+
+Terminating a curses application is much easier than starting one. You'll need
+to call  ::
+
+   curses.nocbreak(); stdscr.keypad(0); curses.echo()
+
+to reverse the curses-friendly terminal settings. Then call the :func:`endwin`
+function to restore the terminal to its original operating mode. ::
+
+   curses.endwin()
+
+A common problem when debugging a curses application is to get your terminal
+messed up when the application dies without restoring the terminal to its
+previous state.  In Python this commonly happens when your code is buggy and
+raises an uncaught exception.  Keys are no longer be echoed to the screen when
+you type them, for example, which makes using the shell difficult.
+
+In Python you can avoid these complications and make debugging much easier by
+importing the module :mod:`curses.wrapper`.  It supplies a :func:`wrapper`
+function that takes a callable.  It does the initializations described above,
+and also initializes colors if color support is present.  It then runs your
+provided callable and finally deinitializes appropriately.  The callable is
+called inside a try-catch clause which catches exceptions, performs curses
+deinitialization, and then passes the exception upwards.  Thus, your terminal
+won't be left in a funny state on exception.
+
+
+Windows and Pads
+================
+
+Windows are the basic abstraction in curses.  A window object represents a
+rectangular area of the screen, and supports various methods to display text,
+erase it, allow the user to input strings, and so forth.
+
+The ``stdscr`` object returned by the :func:`initscr` function is a window
+object that covers the entire screen.  Many programs may need only this single
+window, but you might wish to divide the screen into smaller windows, in order
+to redraw or clear them separately. The :func:`newwin` function creates a new
+window of a given size, returning the new window object. ::
+
+   begin_x = 20 ; begin_y = 7
+   height = 5 ; width = 40
+   win = curses.newwin(height, width, begin_y, begin_x)
+
+A word about the coordinate system used in curses: coordinates are always passed
+in the order *y,x*, and the top-left corner of a window is coordinate (0,0).
+This breaks a common convention for handling coordinates, where the *x*
+coordinate usually comes first.  This is an unfortunate difference from most
+other computer applications, but it's been part of curses since it was first
+written, and it's too late to change things now.
+
+When you call a method to display or erase text, the effect doesn't immediately
+show up on the display.  This is because curses was originally written with slow
+300-baud terminal connections in mind; with these terminals, minimizing the time
+required to redraw the screen is very important.  This lets curses accumulate
+changes to the screen, and display them in the most efficient manner.  For
+example, if your program displays some characters in a window, and then clears
+the window, there's no need to send the original characters because they'd never
+be visible.
+
+Accordingly, curses requires that you explicitly tell it to redraw windows,
+using the :func:`refresh` method of window objects.  In practice, this doesn't
+really complicate programming with curses much. Most programs go into a flurry
+of activity, and then pause waiting for a keypress or some other action on the
+part of the user.  All you have to do is to be sure that the screen has been
+redrawn before pausing to wait for user input, by simply calling
+``stdscr.refresh()`` or the :func:`refresh` method of some other relevant
+window.
+
+A pad is a special case of a window; it can be larger than the actual display
+screen, and only a portion of it displayed at a time. Creating a pad simply
+requires the pad's height and width, while refreshing a pad requires giving the
+coordinates of the on-screen area where a subsection of the pad will be
+displayed.   ::
+
+   pad = curses.newpad(100, 100)
+   #  These loops fill the pad with letters; this is
+   # explained in the next section
+   for y in range(0, 100):
+       for x in range(0, 100):
+           try: pad.addch(y,x, ord('a') + (x*x+y*y) % 26 )
+           except curses.error: pass
+
+   #  Displays a section of the pad in the middle of the screen
+   pad.refresh( 0,0, 5,5, 20,75)
+
+The :func:`refresh` call displays a section of the pad in the rectangle
+extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper
+left corner of the displayed section is coordinate (0,0) on the pad.  Beyond
+that difference, pads are exactly like ordinary windows and support the same
+methods.
+
+If you have multiple windows and pads on screen there is a more efficient way to
+go, which will prevent annoying screen flicker at refresh time.  Use the
+:meth:`noutrefresh` method of each window to update the data structure
+representing the desired state of the screen; then change the physical screen to
+match the desired state in one go with the function :func:`doupdate`.  The
+normal :meth:`refresh` method calls :func:`doupdate` as its last act.
+
+
+Displaying Text
+===============
+
+From a C programmer's point of view, curses may sometimes look like a twisty
+maze of functions, all subtly different.  For example, :func:`addstr` displays a
+string at the current cursor location in the ``stdscr`` window, while
+:func:`mvaddstr` moves to a given y,x coordinate first before displaying the
+string. :func:`waddstr` is just like :func:`addstr`, but allows specifying a
+window to use, instead of using ``stdscr`` by default. :func:`mvwaddstr` follows
+similarly.
+
+Fortunately the Python interface hides all these details; ``stdscr`` is a window
+object like any other, and methods like :func:`addstr` accept multiple argument
+forms.  Usually there are four different forms.
+
++---------------------------------+-----------------------------------------------+
+| Form                            | Description                                   |
++=================================+===============================================+
+| *str* or *ch*                   | Display the string *str* or character *ch* at |
+|                                 | the current position                          |
++---------------------------------+-----------------------------------------------+
+| *str* or *ch*, *attr*           | Display the string *str* or character *ch*,   |
+|                                 | using attribute *attr* at the current         |
+|                                 | position                                      |
++---------------------------------+-----------------------------------------------+
+| *y*, *x*, *str* or *ch*         | Move to position *y,x* within the window, and |
+|                                 | display *str* or *ch*                         |
++---------------------------------+-----------------------------------------------+
+| *y*, *x*, *str* or *ch*, *attr* | Move to position *y,x* within the window, and |
+|                                 | display *str* or *ch*, using attribute *attr* |
++---------------------------------+-----------------------------------------------+
+
+Attributes allow displaying text in highlighted forms, such as in boldface,
+underline, reverse code, or in color.  They'll be explained in more detail in
+the next subsection.
+
+The :func:`addstr` function takes a Python string as the value to be displayed,
+while the :func:`addch` functions take a character, which can be either a Python
+string of length 1 or an integer.  If it's a string, you're limited to
+displaying characters between 0 and 255.  SVr4 curses provides constants for
+extension characters; these constants are integers greater than 255.  For
+example, :const:`ACS_PLMINUS` is a +/- symbol, and :const:`ACS_ULCORNER` is the
+upper left corner of a box (handy for drawing borders).
+
+Windows remember where the cursor was left after the last operation, so if you
+leave out the *y,x* coordinates, the string or character will be displayed
+wherever the last operation left off.  You can also move the cursor with the
+:func:`move(y,x)` method.  Because some terminals always display a flashing
+cursor, you may want to ensure that the cursor is positioned in some location
+where it won't be distracting; it can be confusing to have the cursor blinking
+at some apparently random location.
+
+If your application doesn't need a blinking cursor at all, you can call
+:func:`curs_set(0)` to make it invisible.  Equivalently, and for compatibility
+with older curses versions, there's a :func:`leaveok(bool)` function.  When
+*bool* is true, the curses library will attempt to suppress the flashing cursor,
+and you won't need to worry about leaving it in odd locations.
+
+
+Attributes and Color
+--------------------
+
+Characters can be displayed in different ways.  Status lines in a text-based
+application are commonly shown in reverse video; a text viewer may need to
+highlight certain words.  curses supports this by allowing you to specify an
+attribute for each cell on the screen.
+
+An attribute is a integer, each bit representing a different attribute.  You can
+try to display text with multiple attribute bits set, but curses doesn't
+guarantee that all the possible combinations are available, or that they're all
+visually distinct.  That depends on the ability of the terminal being used, so
+it's safest to stick to the most commonly available attributes, listed here.
+
++----------------------+--------------------------------------+
+| Attribute            | Description                          |
++======================+======================================+
+| :const:`A_BLINK`     | Blinking text                        |
++----------------------+--------------------------------------+
+| :const:`A_BOLD`      | Extra bright or bold text            |
++----------------------+--------------------------------------+
+| :const:`A_DIM`       | Half bright text                     |
++----------------------+--------------------------------------+
+| :const:`A_REVERSE`   | Reverse-video text                   |
++----------------------+--------------------------------------+
+| :const:`A_STANDOUT`  | The best highlighting mode available |
++----------------------+--------------------------------------+
+| :const:`A_UNDERLINE` | Underlined text                      |
++----------------------+--------------------------------------+
+
+So, to display a reverse-video status line on the top line of the screen, you
+could code::
+
+   stdscr.addstr(0, 0, "Current mode: Typing mode",
+   	      curses.A_REVERSE)
+   stdscr.refresh()
+
+The curses library also supports color on those terminals that provide it, The
+most common such terminal is probably the Linux console, followed by color
+xterms.
+
+To use color, you must call the :func:`start_color` function soon after calling
+:func:`initscr`, to initialize the default color set (the
+:func:`curses.wrapper.wrapper` function does this automatically).  Once that's
+done, the :func:`has_colors` function returns TRUE if the terminal in use can
+actually display color.  (Note: curses uses the American spelling 'color',
+instead of the Canadian/British spelling 'colour'.  If you're used to the
+British spelling, you'll have to resign yourself to misspelling it for the sake
+of these functions.)
+
+The curses library maintains a finite number of color pairs, containing a
+foreground (or text) color and a background color.  You can get the attribute
+value corresponding to a color pair with the :func:`color_pair` function; this
+can be bitwise-OR'ed with other attributes such as :const:`A_REVERSE`, but
+again, such combinations are not guaranteed to work on all terminals.
+
+An example, which displays a line of text using color pair 1::
+
+   stdscr.addstr( "Pretty text", curses.color_pair(1) )
+   stdscr.refresh()
+
+As I said before, a color pair consists of a foreground and background color.
+:func:`start_color` initializes 8 basic colors when it activates color mode.
+They are: 0:black, 1:red, 2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and
+7:white.  The curses module defines named constants for each of these colors:
+:const:`curses.COLOR_BLACK`, :const:`curses.COLOR_RED`, and so forth.
+
+The :func:`init_pair(n, f, b)` function changes the definition of color pair
+*n*, to foreground color f and background color b.  Color pair 0 is hard-wired
+to white on black, and cannot be changed.
+
+Let's put all this together. To change color 1 to red text on a white
+background, you would call::
+
+   curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE)
+
+When you change a color pair, any text already displayed using that color pair
+will change to the new colors.  You can also display new text in this color
+with::
+
+   stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1) )
+
+Very fancy terminals can change the definitions of the actual colors to a given
+RGB value.  This lets you change color 1, which is usually red, to purple or
+blue or any other color you like.  Unfortunately, the Linux console doesn't
+support this, so I'm unable to try it out, and can't provide any examples.  You
+can check if your terminal can do this by calling :func:`can_change_color`,
+which returns TRUE if the capability is there.  If you're lucky enough to have
+such a talented terminal, consult your system's man pages for more information.
+
+
+User Input
+==========
+
+The curses library itself offers only very simple input mechanisms. Python's
+support adds a text-input widget that makes up some of the lack.
+
+The most common way to get input to a window is to use its :meth:`getch` method.
+:meth:`getch` pauses and waits for the user to hit a key, displaying it if
+:func:`echo` has been called earlier.  You can optionally specify a coordinate
+to which the cursor should be moved before pausing.
+
+It's possible to change this behavior with the method :meth:`nodelay`. After
+:meth:`nodelay(1)`, :meth:`getch` for the window becomes non-blocking and
+returns ``curses.ERR`` (a value of -1) when no input is ready.  There's also a
+:func:`halfdelay` function, which can be used to (in effect) set a timer on each
+:meth:`getch`; if no input becomes available within the number of milliseconds
+specified as the argument to :func:`halfdelay`, curses raises an exception.
+
+The :meth:`getch` method returns an integer; if it's between 0 and 255, it
+represents the ASCII code of the key pressed.  Values greater than 255 are
+special keys such as Page Up, Home, or the cursor keys. You can compare the
+value returned to constants such as :const:`curses.KEY_PPAGE`,
+:const:`curses.KEY_HOME`, or :const:`curses.KEY_LEFT`.  Usually the main loop of
+your program will look something like this::
+
+   while 1:
+       c = stdscr.getch()
+       if c == ord('p'): PrintDocument()
+       elif c == ord('q'): break  # Exit the while()
+       elif c == curses.KEY_HOME: x = y = 0
+
+The :mod:`curses.ascii` module supplies ASCII class membership functions that
+take either integer or 1-character-string arguments; these may be useful in
+writing more readable tests for your command interpreters.  It also supplies
+conversion functions  that take either integer or 1-character-string arguments
+and return the same type.  For example, :func:`curses.ascii.ctrl` returns the
+control character corresponding to its argument.
+
+There's also a method to retrieve an entire string, :const:`getstr()`.  It isn't
+used very often, because its functionality is quite limited; the only editing
+keys available are the backspace key and the Enter key, which terminates the
+string.  It can optionally be limited to a fixed number of characters. ::
+
+   curses.echo()            # Enable echoing of characters
+
+   # Get a 15-character string, with the cursor on the top line 
+   s = stdscr.getstr(0,0, 15)  
+
+The Python :mod:`curses.textpad` module supplies something better. With it, you
+can turn a window into a text box that supports an Emacs-like set of
+keybindings.  Various methods of :class:`Textbox` class support editing with
+input validation and gathering the edit results either with or without trailing
+spaces.   See the library documentation on :mod:`curses.textpad` for the
+details.
+
+
+For More Information
+====================
+
+This HOWTO didn't cover some advanced topics, such as screen-scraping or
+capturing mouse events from an xterm instance.  But the Python library page for
+the curses modules is now pretty complete.  You should browse it next.
+
+If you're in doubt about the detailed behavior of any of the ncurses entry
+points, consult the manual pages for your curses implementation, whether it's
+ncurses or a proprietary Unix vendor's.  The manual pages will document any
+quirks, and provide complete lists of all the functions, attributes, and
+:const:`ACS_\*` characters available to you.
+
+Because the curses API is so large, some functions aren't supported in the
+Python interface, not because they're difficult to implement, but because no one
+has needed them yet.  Feel free to add them and then submit a patch.  Also, we
+don't yet have support for the menus or panels libraries associated with
+ncurses; feel free to add that.
+
+If you write an interesting little program, feel free to contribute it as
+another demo.  We can always use more of them!
+
+The ncurses FAQ: http://dickey.his.com/ncurses/ncurses.faq.html
+

Added: doctools/trunk/Doc-3k/howto/doanddont.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/doanddont.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,308 @@
+************************************
+  Idioms and Anti-Idioms in Python  
+************************************
+
+:Author: Moshe Zadka
+
+This document is placed in the public doman.
+
+
+.. topic:: Abstract
+
+   This document can be considered a companion to the tutorial. It shows how to use
+   Python, and even more importantly, how *not* to use Python.
+
+
+Language Constructs You Should Not Use
+======================================
+
+While Python has relatively few gotchas compared to other languages, it still
+has some constructs which are only useful in corner cases, or are plain
+dangerous.
+
+
+from module import \*
+---------------------
+
+
+Inside Function Definitions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``from module import *`` is *invalid* inside function definitions. While many
+versions of Python do not check for the invalidity, it does not make it more
+valid, no more then having a smart lawyer makes a man innocent. Do not use it
+like that ever. Even in versions where it was accepted, it made the function
+execution slower, because the compiler could not be certain which names are
+local and which are global. In Python 2.1 this construct causes warnings, and
+sometimes even errors.
+
+
+At Module Level
+^^^^^^^^^^^^^^^
+
+While it is valid to use ``from module import *`` at module level it is usually
+a bad idea. For one, this loses an important property Python otherwise has ---
+you can know where each toplevel name is defined by a simple "search" function
+in your favourite editor. You also open yourself to trouble in the future, if
+some module grows additional functions or classes.
+
+One of the most awful question asked on the newsgroup is why this code::
+
+   f = open("www")
+   f.read()
+
+does not work. Of course, it works just fine (assuming you have a file called
+"www".) But it does not work if somewhere in the module, the statement ``from os
+import *`` is present. The :mod:`os` module has a function called :func:`open`
+which returns an integer. While it is very useful, shadowing builtins is one of
+its least useful properties.
+
+Remember, you can never know for sure what names a module exports, so either
+take what you need --- ``from module import name1, name2``, or keep them in the
+module and access on a per-need basis ---  ``import module;print module.name``.
+
+
+When It Is Just Fine
+^^^^^^^^^^^^^^^^^^^^
+
+There are situations in which ``from module import *`` is just fine:
+
+* The interactive prompt. For example, ``from math import *`` makes Python an
+  amazing scientific calculator.
+
+* When extending a module in C with a module in Python.
+
+* When the module advertises itself as ``from import *`` safe.
+
+
+Unadorned :keyword:`exec`, :func:`execfile` and friends
+-------------------------------------------------------
+
+The word "unadorned" refers to the use without an explicit dictionary, in which
+case those constructs evaluate code in the *current* environment. This is
+dangerous for the same reasons ``from import *`` is dangerous --- it might step
+over variables you are counting on and mess up things for the rest of your code.
+Simply do not do that.
+
+Bad examples::
+
+   >>> for name in sys.argv[1:]:
+   >>>     exec "%s=1" % name
+   >>> def func(s, **kw):
+   >>>     for var, val in kw.items():
+   >>>         exec "s.%s=val" % var  # invalid!
+   >>> execfile("handler.py")
+   >>> handle()
+
+Good examples::
+
+   >>> d = {}
+   >>> for name in sys.argv[1:]:
+   >>>     d[name] = 1
+   >>> def func(s, **kw):
+   >>>     for var, val in kw.items():
+   >>>         setattr(s, var, val)
+   >>> d={}
+   >>> execfile("handle.py", d, d)
+   >>> handle = d['handle']
+   >>> handle()
+
+
+from module import name1, name2
+-------------------------------
+
+This is a "don't" which is much weaker then the previous "don't"s but is still
+something you should not do if you don't have good reasons to do that. The
+reason it is usually bad idea is because you suddenly have an object which lives
+in two seperate namespaces. When the binding in one namespace changes, the
+binding in the other will not, so there will be a discrepancy between them. This
+happens when, for example, one module is reloaded, or changes the definition of
+a function at runtime.
+
+Bad example::
+
+   # foo.py
+   a = 1
+
+   # bar.py
+   from foo import a
+   if something():
+       a = 2 # danger: foo.a != a 
+
+Good example::
+
+   # foo.py
+   a = 1
+
+   # bar.py
+   import foo
+   if something():
+       foo.a = 2
+
+
+except:
+-------
+
+Python has the ``except:`` clause, which catches all exceptions. Since *every*
+error in Python raises an exception, this makes many programming errors look
+like runtime problems, and hinders the debugging process.
+
+The following code shows a great example::
+
+   try:
+       foo = opne("file") # misspelled "open"
+   except:
+       sys.exit("could not open file!")
+
+The second line triggers a :exc:`NameError` which is caught by the except
+clause. The program will exit, and you will have no idea that this has nothing
+to do with the readability of ``"file"``.
+
+The example above is better written ::
+
+   try:
+       foo = opne("file") # will be changed to "open" as soon as we run it
+   except IOError:
+       sys.exit("could not open file")
+
+There are some situations in which the ``except:`` clause is useful: for
+example, in a framework when running callbacks, it is good not to let any
+callback disturb the framework.
+
+
+Exceptions
+==========
+
+Exceptions are a useful feature of Python. You should learn to raise them
+whenever something unexpected occurs, and catch them only where you can do
+something about them.
+
+The following is a very popular anti-idiom ::
+
+   def get_status(file):
+       if not os.path.exists(file):
+           print "file not found"
+           sys.exit(1)
+       return open(file).readline()
+
+Consider the case the file gets deleted between the time the call to
+:func:`os.path.exists` is made and the time :func:`open` is called. That means
+the last line will throw an :exc:`IOError`. The same would happen if *file*
+exists but has no read permission. Since testing this on a normal machine on
+existing and non-existing files make it seem bugless, that means in testing the
+results will seem fine, and the code will get shipped. Then an unhandled
+:exc:`IOError` escapes to the user, who has to watch the ugly traceback.
+
+Here is a better way to do it. ::
+
+   def get_status(file):
+       try:
+           return open(file).readline()
+       except (IOError, OSError):
+           print "file not found"
+           sys.exit(1)
+
+In this version, \*either\* the file gets opened and the line is read (so it
+works even on flaky NFS or SMB connections), or the message is printed and the
+application aborted.
+
+Still, :func:`get_status` makes too many assumptions --- that it will only be
+used in a short running script, and not, say, in a long running server. Sure,
+the caller could do something like ::
+
+   try:
+       status = get_status(log)
+   except SystemExit:
+       status = None
+
+So, try to make as few ``except`` clauses in your code --- those will usually be
+a catch-all in the :func:`main`, or inside calls which should always succeed.
+
+So, the best version is probably ::
+
+   def get_status(file):
+       return open(file).readline()
+
+The caller can deal with the exception if it wants (for example, if it  tries
+several files in a loop), or just let the exception filter upwards to *its*
+caller.
+
+The last version is not very good either --- due to implementation details, the
+file would not be closed when an exception is raised until the handler finishes,
+and perhaps not at all in non-C implementations (e.g., Jython). ::
+
+   def get_status(file):
+       fp = open(file)
+       try:
+           return fp.readline()
+       finally:
+           fp.close()
+
+
+Using the Batteries
+===================
+
+Every so often, people seem to be writing stuff in the Python library again,
+usually poorly. While the occasional module has a poor interface, it is usually
+much better to use the rich standard library and data types that come with
+Python then inventing your own.
+
+A useful module very few people know about is :mod:`os.path`. It  always has the
+correct path arithmetic for your operating system, and will usually be much
+better then whatever you come up with yourself.
+
+Compare::
+
+   # ugh!
+   return dir+"/"+file
+   # better
+   return os.path.join(dir, file)
+
+More useful functions in :mod:`os.path`: :func:`basename`,  :func:`dirname` and
+:func:`splitext`.
+
+There are also many useful builtin functions people seem not to be aware of for
+some reason: :func:`min` and :func:`max` can find the minimum/maximum of any
+sequence with comparable semantics, for example, yet many people write their own
+:func:`max`/:func:`min`. Another highly useful function is :func:`reduce`. A
+classical use of :func:`reduce` is something like ::
+
+   import sys, operator
+   nums = map(float, sys.argv[1:])
+   print reduce(operator.add, nums)/len(nums)
+
+This cute little script prints the average of all numbers given on the command
+line. The :func:`reduce` adds up all the numbers, and the rest is just some pre-
+and postprocessing.
+
+On the same note, note that :func:`float`, :func:`int` and :func:`long` all
+accept arguments of type string, and so are suited to parsing --- assuming you
+are ready to deal with the :exc:`ValueError` they raise.
+
+
+Using Backslash to Continue Statements
+======================================
+
+Since Python treats a newline as a statement terminator, and since statements
+are often more then is comfortable to put in one line, many people do::
+
+   if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
+      calculate_number(10, 20) != forbulate(500, 360):
+         pass
+
+You should realize that this is dangerous: a stray space after the ``XXX`` would
+make this line wrong, and stray spaces are notoriously hard to see in editors.
+In this case, at least it would be a syntax error, but if the code was::
+
+   value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
+           + calculate_number(10, 20)*forbulate(500, 360)
+
+then it would just be subtly wrong.
+
+It is usually much better to use the implicit continuation inside parenthesis:
+
+This version is bulletproof::
+
+   value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9] 
+           + calculate_number(10, 20)*forbulate(500, 360))
+

Added: doctools/trunk/Doc-3k/howto/functional.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/functional.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,1400 @@
+********************************
+  Functional Programming HOWTO
+********************************
+
+:Author: \A. M. Kuchling
+:Release: 0.30
+
+(This is a first draft.  Please send comments/error reports/suggestions to
+amk at amk.ca.  This URL is probably not going to be the final location of the
+document, so be careful about linking to it -- you may want to add a
+disclaimer.)
+
+In this document, we'll take a tour of Python's features suitable for
+implementing programs in a functional style.  After an introduction to the
+concepts of functional programming, we'll look at language features such as
+iterators and generators and relevant library modules such as :mod:`itertools`
+and :mod:`functools`.
+
+
+Introduction
+============
+
+This section explains the basic concept of functional programming; if you're
+just interested in learning about Python language features, skip to the next
+section.
+
+Programming languages support decomposing problems in several different ways:
+
+* Most programming languages are **procedural**: programs are lists of
+  instructions that tell the computer what to do with the program's input.  C,
+  Pascal, and even Unix shells are procedural languages.
+
+* In **declarative** languages, you write a specification that describes the
+  problem to be solved, and the language implementation figures out how to
+  perform the computation efficiently.  SQL is the declarative language you're
+  most likely to be familiar with; a SQL query describes the data set you want
+  to retrieve, and the SQL engine decides whether to scan tables or use indexes,
+  which subclauses should be performed first, etc.
+
+* **Object-oriented** programs manipulate collections of objects.  Objects have
+  internal state and support methods that query or modify this internal state in
+  some way. Smalltalk and Java are object-oriented languages.  C++ and Python
+  are languages that support object-oriented programming, but don't force the
+  use of object-oriented features.
+
+* **Functional** programming decomposes a problem into a set of functions.
+  Ideally, functions only take inputs and produce outputs, and don't have any
+  internal state that affects the output produced for a given input.  Well-known
+  functional languages include the ML family (Standard ML, OCaml, and other
+  variants) and Haskell.
+
+The designers of some computer languages have chosen one approach to programming
+that's emphasized.  This often makes it difficult to write programs that use a
+different approach.  Other languages are multi-paradigm languages that support
+several different approaches.  Lisp, C++, and Python are multi-paradigm; you can
+write programs or libraries that are largely procedural, object-oriented, or
+functional in all of these languages.  In a large program, different sections
+might be written using different approaches; the GUI might be object-oriented
+while the processing logic is procedural or functional, for example.
+
+In a functional program, input flows through a set of functions. Each function
+operates on its input and produces some output.  Functional style frowns upon
+functions with side effects that modify internal state or make other changes
+that aren't visible in the function's return value.  Functions that have no side
+effects at all are called **purely functional**.  Avoiding side effects means
+not using data structures that get updated as a program runs; every function's
+output must only depend on its input.
+
+Some languages are very strict about purity and don't even have assignment
+statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all
+side effects.  Printing to the screen or writing to a disk file are side
+effects, for example.  For example, in Python a ``print`` statement or a
+``time.sleep(1)`` both return no useful value; they're only called for their
+side effects of sending some text to the screen or pausing execution for a
+second.
+
+Python programs written in functional style usually won't go to the extreme of
+avoiding all I/O or all assignments; instead, they'll provide a
+functional-appearing interface but will use non-functional features internally.
+For example, the implementation of a function will still use assignments to
+local variables, but won't modify global variables or have other side effects.
+
+Functional programming can be considered the opposite of object-oriented
+programming.  Objects are little capsules containing some internal state along
+with a collection of method calls that let you modify this state, and programs
+consist of making the right set of state changes.  Functional programming wants
+to avoid state changes as much as possible and works with data flowing between
+functions.  In Python you might combine the two approaches by writing functions
+that take and return instances representing objects in your application (e-mail
+messages, transactions, etc.).
+
+Functional design may seem like an odd constraint to work under.  Why should you
+avoid objects and side effects?  There are theoretical and practical advantages
+to the functional style:
+
+* Formal provability.
+* Modularity.
+* Composability.
+* Ease of debugging and testing.
+
+Formal provability
+------------------
+
+A theoretical benefit is that it's easier to construct a mathematical proof that
+a functional program is correct.
+
+For a long time researchers have been interested in finding ways to
+mathematically prove programs correct.  This is different from testing a program
+on numerous inputs and concluding that its output is usually correct, or reading
+a program's source code and concluding that the code looks right; the goal is
+instead a rigorous proof that a program produces the right result for all
+possible inputs.
+
+The technique used to prove programs correct is to write down **invariants**,
+properties of the input data and of the program's variables that are always
+true.  For each line of code, you then show that if invariants X and Y are true
+**before** the line is executed, the slightly different invariants X' and Y' are
+true **after** the line is executed.  This continues until you reach the end of
+the program, at which point the invariants should match the desired conditions
+on the program's output.
+
+Functional programming's avoidance of assignments arose because assignments are
+difficult to handle with this technique; assignments can break invariants that
+were true before the assignment without producing any new invariants that can be
+propagated onward.
+
+Unfortunately, proving programs correct is largely impractical and not relevant
+to Python software. Even trivial programs require proofs that are several pages
+long; the proof of correctness for a moderately complicated program would be
+enormous, and few or none of the programs you use daily (the Python interpreter,
+your XML parser, your web browser) could be proven correct.  Even if you wrote
+down or generated a proof, there would then be the question of verifying the
+proof; maybe there's an error in it, and you wrongly believe you've proved the
+program correct.
+
+Modularity
+----------
+
+A more practical benefit of functional programming is that it forces you to
+break apart your problem into small pieces.  Programs are more modular as a
+result.  It's easier to specify and write a small function that does one thing
+than a large function that performs a complicated transformation.  Small
+functions are also easier to read and to check for errors.
+
+
+Ease of debugging and testing 
+-----------------------------
+
+Testing and debugging a functional-style program is easier.
+
+Debugging is simplified because functions are generally small and clearly
+specified.  When a program doesn't work, each function is an interface point
+where you can check that the data are correct.  You can look at the intermediate
+inputs and outputs to quickly isolate the function that's responsible for a bug.
+
+Testing is easier because each function is a potential subject for a unit test.
+Functions don't depend on system state that needs to be replicated before
+running a test; instead you only have to synthesize the right input and then
+check that the output matches expectations.
+
+
+
+Composability
+-------------
+
+As you work on a functional-style program, you'll write a number of functions
+with varying inputs and outputs.  Some of these functions will be unavoidably
+specialized to a particular application, but others will be useful in a wide
+variety of programs.  For example, a function that takes a directory path and
+returns all the XML files in the directory, or a function that takes a filename
+and returns its contents, can be applied to many different situations.
+
+Over time you'll form a personal library of utilities.  Often you'll assemble
+new programs by arranging existing functions in a new configuration and writing
+a few functions specialized for the current task.
+
+
+
+Iterators
+=========
+
+I'll start by looking at a Python language feature that's an important
+foundation for writing functional-style programs: iterators.
+
+An iterator is an object representing a stream of data; this object returns the
+data one element at a time.  A Python iterator must support a method called
+``next()`` that takes no arguments and always returns the next element of the
+stream.  If there are no more elements in the stream, ``next()`` must raise the
+``StopIteration`` exception.  Iterators don't have to be finite, though; it's
+perfectly reasonable to write an iterator that produces an infinite stream of
+data.
+
+The built-in :func:`iter` function takes an arbitrary object and tries to return
+an iterator that will return the object's contents or elements, raising
+:exc:`TypeError` if the object doesn't support iteration.  Several of Python's
+built-in data types support iteration, the most common being lists and
+dictionaries.  An object is called an **iterable** object if you can get an
+iterator for it.
+
+You can experiment with the iteration interface manually::
+
+    >>> L = [1,2,3]
+    >>> it = iter(L)
+    >>> print it
+    <iterator object at 0x8116870>
+    >>> it.next()
+    1
+    >>> it.next()
+    2
+    >>> it.next()
+    3
+    >>> it.next()
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    StopIteration
+    >>>      
+
+Python expects iterable objects in several different contexts, the most
+important being the ``for`` statement.  In the statement ``for X in Y``, Y must
+be an iterator or some object for which ``iter()`` can create an iterator.
+These two statements are equivalent::
+
+        for i in iter(obj):
+            print i
+
+        for i in obj:
+            print i
+
+Iterators can be materialized as lists or tuples by using the :func:`list` or
+:func:`tuple` constructor functions::
+
+    >>> L = [1,2,3]
+    >>> iterator = iter(L)
+    >>> t = tuple(iterator)
+    >>> t
+    (1, 2, 3)
+
+Sequence unpacking also supports iterators: if you know an iterator will return
+N elements, you can unpack them into an N-tuple::
+
+    >>> L = [1,2,3]
+    >>> iterator = iter(L)
+    >>> a,b,c = iterator
+    >>> a,b,c
+    (1, 2, 3)
+
+Built-in functions such as :func:`max` and :func:`min` can take a single
+iterator argument and will return the largest or smallest element.  The ``"in"``
+and ``"not in"`` operators also support iterators: ``X in iterator`` is true if
+X is found in the stream returned by the iterator.  You'll run into obvious
+problems if the iterator is infinite; ``max()``, ``min()``, and ``"not in"``
+will never return, and if the element X never appears in the stream, the
+``"in"`` operator won't return either.
+
+Note that you can only go forward in an iterator; there's no way to get the
+previous element, reset the iterator, or make a copy of it.  Iterator objects
+can optionally provide these additional capabilities, but the iterator protocol
+only specifies the ``next()`` method.  Functions may therefore consume all of
+the iterator's output, and if you need to do something different with the same
+stream, you'll have to create a new iterator.
+
+
+
+Data Types That Support Iterators
+---------------------------------
+
+We've already seen how lists and tuples support iterators.  In fact, any Python
+sequence type, such as strings, will automatically support creation of an
+iterator.
+
+Calling :func:`iter` on a dictionary returns an iterator that will loop over the
+dictionary's keys::
+
+    >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
+    ...      'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
+    >>> for key in m:
+    ...     print key, m[key]
+    Mar 3
+    Feb 2
+    Aug 8
+    Sep 9
+    May 5
+    Jun 6
+    Jul 7
+    Jan 1
+    Apr 4
+    Nov 11
+    Dec 12
+    Oct 10
+
+Note that the order is essentially random, because it's based on the hash
+ordering of the objects in the dictionary.
+
+Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries
+have methods that return other iterators.  If you want to iterate over keys,
+values, or key/value pairs, you can explicitly call the ``iterkeys()``,
+``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator.
+
+The :func:`dict` constructor can accept an iterator that returns a finite stream
+of ``(key, value)`` tuples::
+
+    >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
+    >>> dict(iter(L))
+    {'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
+
+Files also support iteration by calling the ``readline()`` method until there
+are no more lines in the file.  This means you can read each line of a file like
+this::
+
+    for line in file:
+        # do something for each line
+        ...
+
+Sets can take their contents from an iterable and let you iterate over the set's
+elements::
+
+    S = set((2, 3, 5, 7, 11, 13))
+    for i in S:
+        print i
+
+
+
+Generator expressions and list comprehensions
+=============================================
+
+Two common operations on an iterator's output are 1) performing some operation
+for every element, 2) selecting a subset of elements that meet some condition.
+For example, given a list of strings, you might want to strip off trailing
+whitespace from each line or extract all the strings containing a given
+substring.
+
+List comprehensions and generator expressions (short form: "listcomps" and
+"genexps") are a concise notation for such operations, borrowed from the
+functional programming language Haskell (http://www.haskell.org).  You can strip
+all the whitespace from a stream of strings with the following code::
+
+        line_list = ['  line 1\n', 'line 2  \n', ...]
+
+        # Generator expression -- returns iterator
+        stripped_iter = (line.strip() for line in line_list)
+
+        # List comprehension -- returns list
+        stripped_list = [line.strip() for line in line_list]
+
+You can select only certain elements by adding an ``"if"`` condition::
+
+        stripped_list = [line.strip() for line in line_list
+                         if line != ""]
+
+With a list comprehension, you get back a Python list; ``stripped_list`` is a
+list containing the resulting lines, not an iterator.  Generator expressions
+return an iterator that computes the values as necessary, not needing to
+materialize all the values at once.  This means that list comprehensions aren't
+useful if you're working with iterators that return an infinite stream or a very
+large amount of data.  Generator expressions are preferable in these situations.
+
+Generator expressions are surrounded by parentheses ("()") and list
+comprehensions are surrounded by square brackets ("[]").  Generator expressions
+have the form::
+
+    ( expression for expr in sequence1 
+                 if condition1
+                 for expr2 in sequence2
+                 if condition2
+                 for expr3 in sequence3 ...
+                 if condition3
+                 for exprN in sequenceN
+                 if conditionN )
+
+Again, for a list comprehension only the outside brackets are different (square
+brackets instead of parentheses).
+
+The elements of the generated output will be the successive values of
+``expression``.  The ``if`` clauses are all optional; if present, ``expression``
+is only evaluated and added to the result when ``condition`` is true.
+
+Generator expressions always have to be written inside parentheses, but the
+parentheses signalling a function call also count.  If you want to create an
+iterator that will be immediately passed to a function you can write::
+
+        obj_total = sum(obj.count for obj in list_all_objects())
+
+The ``for...in`` clauses contain the sequences to be iterated over.  The
+sequences do not have to be the same length, because they are iterated over from
+left to right, **not** in parallel.  For each element in ``sequence1``,
+``sequence2`` is looped over from the beginning.  ``sequence3`` is then looped
+over for each resulting pair of elements from ``sequence1`` and ``sequence2``.
+
+To put it another way, a list comprehension or generator expression is
+equivalent to the following Python code::
+
+    for expr1 in sequence1:
+        if not (condition1):
+            continue   # Skip this element
+        for expr2 in sequence2:
+            if not (condition2):
+                continue    # Skip this element
+            ...
+            for exprN in sequenceN:
+                 if not (conditionN):
+                     continue   # Skip this element
+
+                 # Output the value of 
+                 # the expression.
+
+This means that when there are multiple ``for...in`` clauses but no ``if``
+clauses, the length of the resulting output will be equal to the product of the
+lengths of all the sequences.  If you have two lists of length 3, the output
+list is 9 elements long::
+
+    seq1 = 'abc'
+    seq2 = (1,2,3)
+    >>> [ (x,y) for x in seq1 for y in seq2]
+    [('a', 1), ('a', 2), ('a', 3), 
+     ('b', 1), ('b', 2), ('b', 3), 
+     ('c', 1), ('c', 2), ('c', 3)]
+
+To avoid introducing an ambiguity into Python's grammar, if ``expression`` is
+creating a tuple, it must be surrounded with parentheses.  The first list
+comprehension below is a syntax error, while the second one is correct::
+
+    # Syntax error
+    [ x,y for x in seq1 for y in seq2]
+    # Correct
+    [ (x,y) for x in seq1 for y in seq2]
+
+
+Generators
+==========
+
+Generators are a special class of functions that simplify the task of writing
+iterators.  Regular functions compute a value and return it, but generators
+return an iterator that returns a stream of values.
+
+You're doubtless familiar with how regular function calls work in Python or C.
+When you call a function, it gets a private namespace where its local variables
+are created.  When the function reaches a ``return`` statement, the local
+variables are destroyed and the value is returned to the caller.  A later call
+to the same function creates a new private namespace and a fresh set of local
+variables. But, what if the local variables weren't thrown away on exiting a
+function?  What if you could later resume the function where it left off?  This
+is what generators provide; they can be thought of as resumable functions.
+
+Here's the simplest example of a generator function::
+
+    def generate_ints(N):
+        for i in range(N):
+            yield i
+
+Any function containing a ``yield`` keyword is a generator function; this is
+detected by Python's bytecode compiler which compiles the function specially as
+a result.
+
+When you call a generator function, it doesn't return a single value; instead it
+returns a generator object that supports the iterator protocol.  On executing
+the ``yield`` expression, the generator outputs the value of ``i``, similar to a
+``return`` statement.  The big difference between ``yield`` and a ``return``
+statement is that on reaching a ``yield`` the generator's state of execution is
+suspended and local variables are preserved.  On the next call to the
+generator's ``.next()`` method, the function will resume executing.
+
+Here's a sample usage of the ``generate_ints()`` generator::
+
+    >>> gen = generate_ints(3)
+    >>> gen
+    <generator object at 0x8117f90>
+    >>> gen.next()
+    0
+    >>> gen.next()
+    1
+    >>> gen.next()
+    2
+    >>> gen.next()
+    Traceback (most recent call last):
+      File "stdin", line 1, in ?
+      File "stdin", line 2, in generate_ints
+    StopIteration
+
+You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
+generate_ints(3)``.
+
+Inside a generator function, the ``return`` statement can only be used without a
+value, and signals the end of the procession of values; after executing a
+``return`` the generator cannot return any further values.  ``return`` with a
+value, such as ``return 5``, is a syntax error inside a generator function.  The
+end of the generator's results can also be indicated by raising
+``StopIteration`` manually, or by just letting the flow of execution fall off
+the bottom of the function.
+
+You could achieve the effect of generators manually by writing your own class
+and storing all the local variables of the generator as instance variables.  For
+example, returning a list of integers could be done by setting ``self.count`` to
+0, and having the ``next()`` method increment ``self.count`` and return it.
+However, for a moderately complicated generator, writing a corresponding class
+can be much messier.
+
+The test suite included with Python's library, ``test_generators.py``, contains
+a number of more interesting examples.  Here's one generator that implements an
+in-order traversal of a tree using generators recursively.
+
+::
+
+    # A recursive generator that generates Tree leaves in in-order.
+    def inorder(t):
+        if t:
+            for x in inorder(t.left):
+                yield x
+
+            yield t.label
+
+            for x in inorder(t.right):
+                yield x
+
+Two other examples in ``test_generators.py`` produce solutions for the N-Queens
+problem (placing N queens on an NxN chess board so that no queen threatens
+another) and the Knight's Tour (finding a route that takes a knight to every
+square of an NxN chessboard without visiting any square twice).
+
+
+
+Passing values into a generator
+-------------------------------
+
+In Python 2.4 and earlier, generators only produced output.  Once a generator's
+code was invoked to create an iterator, there was no way to pass any new
+information into the function when its execution is resumed.  You could hack
+together this ability by making the generator look at a global variable or by
+passing in some mutable object that callers then modify, but these approaches
+are messy.
+
+In Python 2.5 there's a simple way to pass values into a generator.
+:keyword:`yield` became an expression, returning a value that can be assigned to
+a variable or otherwise operated on::
+
+    val = (yield i)
+
+I recommend that you **always** put parentheses around a ``yield`` expression
+when you're doing something with the returned value, as in the above example.
+The parentheses aren't always necessary, but it's easier to always add them
+instead of having to remember when they're needed.
+
+(PEP 342 explains the exact rules, which are that a ``yield``-expression must
+always be parenthesized except when it occurs at the top-level expression on the
+right-hand side of an assignment.  This means you can write ``val = yield i``
+but have to use parentheses when there's an operation, as in ``val = (yield i)
++ 12``.)
+
+Values are sent into a generator by calling its ``send(value)`` method.  This
+method resumes the generator's code and the ``yield`` expression returns the
+specified value.  If the regular ``next()`` method is called, the ``yield``
+returns ``None``.
+
+Here's a simple counter that increments by 1 and allows changing the value of
+the internal counter.
+
+::
+
+    def counter (maximum):
+        i = 0
+        while i < maximum:
+            val = (yield i)
+            # If value provided, change counter
+            if val is not None:
+                i = val
+            else:
+                i += 1
+
+And here's an example of changing the counter:
+
+    >>> it = counter(10)
+    >>> print it.next()
+    0
+    >>> print it.next()
+    1
+    >>> print it.send(8)
+    8
+    >>> print it.next()
+    9
+    >>> print it.next()
+    Traceback (most recent call last):
+      File ``t.py'', line 15, in ?
+        print it.next()
+    StopIteration
+
+Because ``yield`` will often be returning ``None``, you should always check for
+this case.  Don't just use its value in expressions unless you're sure that the
+``send()`` method will be the only method used resume your generator function.
+
+In addition to ``send()``, there are two other new methods on generators:
+
+* ``throw(type, value=None, traceback=None)`` is used to raise an exception
+  inside the generator; the exception is raised by the ``yield`` expression
+  where the generator's execution is paused.
+
+* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to
+  terminate the iteration.  On receiving this exception, the generator's code
+  must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the
+  exception and doing anything else is illegal and will trigger a
+  :exc:`RuntimeError`.  ``close()`` will also be called by Python's garbage
+  collector when the generator is garbage-collected.
+
+  If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest
+  using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`.
+
+The cumulative effect of these changes is to turn generators from one-way
+producers of information into both producers and consumers.
+
+Generators also become **coroutines**, a more generalized form of subroutines.
+Subroutines are entered at one point and exited at another point (the top of the
+function, and a ``return`` statement), but coroutines can be entered, exited,
+and resumed at many different points (the ``yield`` statements).
+
+
+Built-in functions
+==================
+
+Let's look in more detail at built-in functions often used with iterators.
+
+Two Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
+obsolete; they duplicate the features of list comprehensions but return actual
+lists instead of iterators.
+
+``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
+f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
+
+::
+
+    def upper(s):
+        return s.upper()
+    map(upper, ['sentence', 'fragment']) =>
+      ['SENTENCE', 'FRAGMENT']
+
+    [upper(s) for s in ['sentence', 'fragment']] =>
+      ['SENTENCE', 'FRAGMENT']
+
+As shown above, you can achieve the same effect with a list comprehension.  The
+:func:`itertools.imap` function does the same thing but can handle infinite
+iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
+
+``filter(predicate, iter)`` returns a list that contains all the sequence
+elements that meet a certain condition, and is similarly duplicated by list
+comprehensions.  A **predicate** is a function that returns the truth value of
+some condition; for use with :func:`filter`, the predicate must take a single
+value.
+
+::
+
+    def is_even(x):
+        return (x % 2) == 0
+
+    filter(is_even, range(10)) =>
+      [0, 2, 4, 6, 8]
+
+This can also be written as a list comprehension::
+
+    >>> [x for x in range(10) if is_even(x)]
+    [0, 2, 4, 6, 8]
+
+:func:`filter` also has a counterpart in the :mod:`itertools` module,
+:func:`itertools.ifilter`, that returns an iterator and can therefore handle
+infinite sequences just as :func:`itertools.imap` can.
+
+``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
+:mod:`itertools` module because it cumulatively performs an operation on all the
+iterable's elements and therefore can't be applied to infinite iterables.
+``func`` must be a function that takes two elements and returns a single value.
+:func:`reduce` takes the first two elements A and B returned by the iterator and
+calculates ``func(A, B)``.  It then requests the third element, C, calculates
+``func(func(A, B), C)``, combines this result with the fourth element returned,
+and continues until the iterable is exhausted.  If the iterable returns no
+values at all, a :exc:`TypeError` exception is raised.  If the initial value is
+supplied, it's used as a starting point and ``func(initial_value, A)`` is the
+first calculation.
+
+::
+
+    import operator
+    reduce(operator.concat, ['A', 'BB', 'C']) =>
+      'ABBC'
+    reduce(operator.concat, []) =>
+      TypeError: reduce() of empty sequence with no initial value
+    reduce(operator.mul, [1,2,3], 1) =>
+      6
+    reduce(operator.mul, [], 1) =>
+      1
+
+If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
+elements of the iterable.  This case is so common that there's a special
+built-in called :func:`sum` to compute it::
+
+    reduce(operator.add, [1,2,3,4], 0) =>
+      10
+    sum([1,2,3,4]) =>
+      10
+    sum([]) =>
+      0
+
+For many uses of :func:`reduce`, though, it can be clearer to just write the
+obvious :keyword:`for` loop::
+
+    # Instead of:
+    product = reduce(operator.mul, [1,2,3], 1)
+
+    # You can write:
+    product = 1
+    for i in [1,2,3]:
+        product *= i
+
+
+``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
+containing the count and each element.
+
+::
+
+    enumerate(['subject', 'verb', 'object']) =>
+      (0, 'subject'), (1, 'verb'), (2, 'object')
+
+:func:`enumerate` is often used when looping through a list and recording the
+indexes at which certain conditions are met::
+
+    f = open('data.txt', 'r')
+    for i, line in enumerate(f):
+        if line.strip() == '':
+            print 'Blank line at line #%i' % i
+
+``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the
+elements of the iterable into a list, sorts the list, and returns the sorted
+result.  The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
+the constructed list's ``.sort()`` method.
+
+::
+
+    import random
+    # Generate 8 random numbers between [0, 10000)
+    rand_list = random.sample(range(10000), 8)
+    rand_list =>
+      [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
+    sorted(rand_list) =>
+      [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
+    sorted(rand_list, reverse=True) =>
+      [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
+
+(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the
+Python wiki at http://wiki.python.org/moin/HowTo/Sorting.)
+
+The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an
+iterable's contents.  :func:`any` returns True if any element in the iterable is
+a true value, and :func:`all` returns True if all of the elements are true
+values::
+
+    any([0,1,0]) =>
+      True
+    any([0,0,0]) =>
+      False
+    any([1,1,1]) =>
+      True
+    all([0,1,0]) =>
+      False
+    all([0,0,0]) => 
+      False
+    all([1,1,1]) =>
+      True
+
+
+Small functions and the lambda expression
+=========================================
+
+When writing functional-style programs, you'll often need little functions that
+act as predicates or that combine elements in some way.
+
+If there's a Python built-in or a module function that's suitable, you don't
+need to define a new function at all::
+
+        stripped_lines = [line.strip() for line in lines]
+        existing_files = filter(os.path.exists, file_list)
+
+If the function you need doesn't exist, you need to write it.  One way to write
+small functions is to use the ``lambda`` statement.  ``lambda`` takes a number
+of parameters and an expression combining these parameters, and creates a small
+function that returns the value of the expression::
+
+        lowercase = lambda x: x.lower()
+
+        print_assign = lambda name, value: name + '=' + str(value)
+
+        adder = lambda x, y: x+y
+
+An alternative is to just use the ``def`` statement and define a function in the
+usual way::
+
+        def lowercase(x):
+            return x.lower()
+
+        def print_assign(name, value):
+            return name + '=' + str(value)
+
+        def adder(x,y):
+            return x + y
+
+Which alternative is preferable?  That's a style question; my usual course is to
+avoid using ``lambda``.
+
+One reason for my preference is that ``lambda`` is quite limited in the
+functions it can define.  The result has to be computable as a single
+expression, which means you can't have multiway ``if... elif... else``
+comparisons or ``try... except`` statements.  If you try to do too much in a
+``lambda`` statement, you'll end up with an overly complicated expression that's
+hard to read.  Quick, what's the following code doing?
+
+::
+
+    total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
+
+You can figure it out, but it takes time to disentangle the expression to figure
+out what's going on.  Using a short nested ``def`` statements makes things a
+little bit better::
+
+    def combine (a, b):
+        return 0, a[1] + b[1]
+
+    total = reduce(combine, items)[1]
+
+But it would be best of all if I had simply used a ``for`` loop::
+
+     total = 0
+     for a, b in items:
+         total += b
+
+Or the :func:`sum` built-in and a generator expression::
+
+     total = sum(b for a,b in items)
+
+Many uses of :func:`reduce` are clearer when written as ``for`` loops.
+
+Fredrik Lundh once suggested the following set of rules for refactoring uses of
+``lambda``:
+
+1) Write a lambda function.
+2) Write a comment explaining what the heck that lambda does.
+3) Study the comment for a while, and think of a name that captures the essence
+   of the comment.
+4) Convert the lambda to a def statement, using that name.
+5) Remove the comment.
+
+I really like these rules, but you're free to disagree that this lambda-free
+style is better.
+
+
+The itertools module
+====================
+
+The :mod:`itertools` module contains a number of commonly-used iterators as well
+as functions for combining several iterators.  This section will introduce the
+module's contents by showing small examples.
+
+The module's functions fall into a few broad classes:
+
+* Functions that create a new iterator based on an existing iterator.
+* Functions for treating an iterator's elements as function arguments.
+* Functions for selecting portions of an iterator's output.
+* A function for grouping an iterator's output.
+
+Creating new iterators
+----------------------
+
+``itertools.count(n)`` returns an infinite stream of integers, increasing by 1
+each time.  You can optionally supply the starting number, which defaults to 0::
+
+        itertools.count() =>
+          0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
+        itertools.count(10) =>
+          10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
+
+``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable
+and returns a new iterator that returns its elements from first to last.  The
+new iterator will repeat these elements infinitely.
+
+::
+
+        itertools.cycle([1,2,3,4,5]) =>
+          1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
+
+``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or
+returns the element endlessly if ``n`` is not provided.
+
+::
+
+    itertools.repeat('abc') =>
+      abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
+    itertools.repeat('abc', 5) =>
+      abc, abc, abc, abc, abc
+
+``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as
+input, and returns all the elements of the first iterator, then all the elements
+of the second, and so on, until all of the iterables have been exhausted.
+
+::
+
+    itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
+      a, b, c, 1, 2, 3
+
+``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and
+returns them in a tuple::
+
+    itertools.izip(['a', 'b', 'c'], (1, 2, 3)) =>
+      ('a', 1), ('b', 2), ('c', 3)
+
+It's similiar to the built-in :func:`zip` function, but doesn't construct an
+in-memory list and exhaust all the input iterators before returning; instead
+tuples are constructed and returned only if they're requested.  (The technical
+term for this behaviour is `lazy evaluation
+<http://en.wikipedia.org/wiki/Lazy_evaluation>`__.)
+
+This iterator is intended to be used with iterables that are all of the same
+length.  If the iterables are of different lengths, the resulting stream will be
+the same length as the shortest iterable.
+
+::
+
+    itertools.izip(['a', 'b'], (1, 2, 3)) =>
+      ('a', 1), ('b', 2)
+
+You should avoid doing this, though, because an element may be taken from the
+longer iterators and discarded.  This means you can't go on to use the iterators
+further because you risk skipping a discarded element.
+
+``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a
+slice of the iterator.  With a single ``stop`` argument, it will return the
+first ``stop`` elements.  If you supply a starting index, you'll get
+``stop-start`` elements, and if you supply a value for ``step``, elements will
+be skipped accordingly.  Unlike Python's string and list slicing, you can't use
+negative values for ``start``, ``stop``, or ``step``.
+
+::
+
+    itertools.islice(range(10), 8) =>
+      0, 1, 2, 3, 4, 5, 6, 7
+    itertools.islice(range(10), 2, 8) =>
+      2, 3, 4, 5, 6, 7
+    itertools.islice(range(10), 2, 8, 2) =>
+      2, 4, 6
+
+``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
+independent iterators that will all return the contents of the source iterator.
+If you don't supply a value for ``n``, the default is 2.  Replicating iterators
+requires saving some of the contents of the source iterator, so this can consume
+significant memory if the iterator is large and one of the new iterators is
+consumed more than the others.
+
+::
+
+        itertools.tee( itertools.count() ) =>
+           iterA, iterB
+
+        where iterA ->
+           0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
+
+        and   iterB ->
+           0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
+
+
+Calling functions on elements
+-----------------------------
+
+Two functions are used for calling other functions on the contents of an
+iterable.
+
+``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
+``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
+
+    itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
+      6, 8, 8
+
+The ``operator`` module contains a set of functions corresponding to Python's
+operators.  Some examples are ``operator.add(a, b)`` (adds two values),
+``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')``
+(returns a callable that fetches the ``"id"`` attribute).
+
+``itertools.starmap(func, iter)`` assumes that the iterable will return a stream
+of tuples, and calls ``f()`` using these tuples as the arguments::
+
+    itertools.starmap(os.path.join, 
+                      [('/usr', 'bin', 'java'), ('/bin', 'python'),
+                       ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
+    =>
+      /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
+
+
+Selecting elements
+------------------
+
+Another group of functions chooses a subset of an iterator's elements based on a
+predicate.
+
+``itertools.ifilter(predicate, iter)`` returns all the elements for which the
+predicate returns true::
+
+    def is_even(x):
+        return (x % 2) == 0
+
+    itertools.ifilter(is_even, itertools.count()) =>
+      0, 2, 4, 6, 8, 10, 12, 14, ...
+
+``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
+elements for which the predicate returns false::
+
+    itertools.ifilterfalse(is_even, itertools.count()) =>
+      1, 3, 5, 7, 9, 11, 13, 15, ...
+
+``itertools.takewhile(predicate, iter)`` returns elements for as long as the
+predicate returns true.  Once the predicate returns false, the iterator will
+signal the end of its results.
+
+::
+
+    def less_than_10(x):
+        return (x < 10)
+
+    itertools.takewhile(less_than_10, itertools.count()) =>
+      0, 1, 2, 3, 4, 5, 6, 7, 8, 9
+
+    itertools.takewhile(is_even, itertools.count()) =>
+      0
+
+``itertools.dropwhile(predicate, iter)`` discards elements while the predicate
+returns true, and then returns the rest of the iterable's results.
+
+::
+
+    itertools.dropwhile(less_than_10, itertools.count()) =>
+      10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
+
+    itertools.dropwhile(is_even, itertools.count()) =>
+      1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
+
+
+Grouping elements
+'''''''''''''''''
+
+The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is
+the most complicated.  ``key_func(elem)`` is a function that can compute a key
+value for each element returned by the iterable.  If you don't supply a key
+function, the key is simply each element itself.
+
+``groupby()`` collects all the consecutive elements from the underlying iterable
+that have the same key value, and returns a stream of 2-tuples containing a key
+value and an iterator for the elements with that key.
+
+::
+
+    city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'), 
+                 ('Anchorage', 'AK'), ('Nome', 'AK'),
+                 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'), 
+                 ...
+                ]
+
+    def get_state ((city, state)):
+        return state
+
+    itertools.groupby(city_list, get_state) =>
+      ('AL', iterator-1),
+      ('AK', iterator-2),
+      ('AZ', iterator-3), ...
+
+    where
+    iterator-1 =>
+      ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
+    iterator-2 => 
+      ('Anchorage', 'AK'), ('Nome', 'AK')
+    iterator-3 =>
+      ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
+
+``groupby()`` assumes that the underlying iterable's contents will already be
+sorted based on the key.  Note that the returned iterators also use the
+underlying iterable, so you have to consume the results of iterator-1 before
+requesting iterator-2 and its corresponding key.
+
+
+The functools module
+====================
+
+The :mod:`functools` module in Python 2.5 contains some higher-order functions.
+A **higher-order function** takes one or more functions as input and returns a
+new function.  The most useful tool in this module is the
+:func:`functools.partial` function.
+
+For programs written in a functional style, you'll sometimes want to construct
+variants of existing functions that have some of the parameters filled in.
+Consider a Python function ``f(a, b, c)``; you may wish to create a new function
+``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for
+one of ``f()``'s parameters.  This is called "partial function application".
+
+The constructor for ``partial`` takes the arguments ``(function, arg1, arg2,
+... kwarg1=value1, kwarg2=value2)``.  The resulting object is callable, so you
+can just call it to invoke ``function`` with the filled-in arguments.
+
+Here's a small but realistic example::
+
+    import functools
+
+    def log (message, subsystem):
+        "Write the contents of 'message' to the specified subsystem."
+        print '%s: %s' % (subsystem, message)
+        ...
+
+    server_log = functools.partial(log, subsystem='server')
+    server_log('Unable to open socket')
+
+
+The operator module
+-------------------
+
+The :mod:`operator` module was mentioned earlier.  It contains a set of
+functions corresponding to Python's operators.  These functions are often useful
+in functional-style code because they save you from writing trivial functions
+that perform a single operation.
+
+Some of the functions in this module are:
+
+* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
+  ``abs()``, ...
+* Logical operations: ``not_()``, ``truth()``.
+* Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
+* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
+* Object identity: ``is_()``, ``is_not()``.
+
+Consult the operator module's documentation for a complete list.
+
+
+
+The functional module
+---------------------
+
+Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__
+provides a number of more advanced tools for functional programming. It also
+reimplements several Python built-ins, trying to make them more intuitive to
+those used to functional programming in other languages.
+
+This section contains an introduction to some of the most important functions in
+``functional``; full documentation can be found at `the project's website
+<http://oakwinter.com/code/functional/documentation/>`__.
+
+``compose(outer, inner, unpack=False)``
+
+The ``compose()`` function implements function composition.  In other words, it
+returns a wrapper around the ``outer`` and ``inner`` callables, such that the
+return value from ``inner`` is fed directly to ``outer``.  That is,
+
+::
+
+        >>> def add(a, b):
+        ...     return a + b
+        ...
+        >>> def double(a):
+        ...     return 2 * a
+        ...
+        >>> compose(double, add)(5, 6)
+        22
+
+is equivalent to
+
+::
+
+        >>> double(add(5, 6))
+        22
+                    
+The ``unpack`` keyword is provided to work around the fact that Python functions
+are not always `fully curried <http://en.wikipedia.org/wiki/Currying>`__.  By
+default, it is expected that the ``inner`` function will return a single object
+and that the ``outer`` function will take a single argument. Setting the
+``unpack`` argument causes ``compose`` to expect a tuple from ``inner`` which
+will be expanded before being passed to ``outer``. Put simply,
+
+::
+
+        compose(f, g)(5, 6)
+                    
+is equivalent to::
+
+        f(g(5, 6))
+                    
+while
+
+::
+
+        compose(f, g, unpack=True)(5, 6)
+                    
+is equivalent to::
+
+        f(*g(5, 6))
+
+Even though ``compose()`` only accepts two functions, it's trivial to build up a
+version that will compose any number of functions. We'll use ``reduce()``,
+``compose()`` and ``partial()`` (the last of which is provided by both
+``functional`` and ``functools``).
+
+::
+
+        from functional import compose, partial
+        
+        multi_compose = partial(reduce, compose)
+        
+    
+We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of
+``"".join(...)`` that converts its arguments to string::
+
+        from functional import compose, partial
+        
+        join = compose("".join, partial(map, str))
+
+
+``flip(func)``
+                    
+``flip()`` wraps the callable in ``func`` and causes it to receive its
+non-keyword arguments in reverse order.
+
+::
+
+        >>> def triple(a, b, c):
+        ...     return (a, b, c)
+        ...
+        >>> triple(5, 6, 7)
+        (5, 6, 7)
+        >>>
+        >>> flipped_triple = flip(triple)
+        >>> flipped_triple(5, 6, 7)
+        (7, 6, 5)
+
+``foldl(func, start, iterable)``
+                    
+``foldl()`` takes a binary function, a starting value (usually some kind of
+'zero'), and an iterable.  The function is applied to the starting value and the
+first element of the list, then the result of that and the second element of the
+list, then the result of that and the third element of the list, and so on.
+
+This means that a call such as::
+
+        foldl(f, 0, [1, 2, 3])
+
+is equivalent to::
+
+        f(f(f(0, 1), 2), 3)
+
+    
+``foldl()`` is roughly equivalent to the following recursive function::
+
+        def foldl(func, start, seq):
+            if len(seq) == 0:
+                return start
+
+            return foldl(func, func(start, seq[0]), seq[1:])
+
+Speaking of equivalence, the above ``foldl`` call can be expressed in terms of
+the built-in ``reduce`` like so::
+
+        reduce(f, [1, 2, 3], 0)
+
+
+We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to write a
+cleaner, more aesthetically-pleasing version of Python's ``"".join(...)``
+idiom::
+
+        from functional import foldl, partial
+        from operator import concat
+        
+        join = partial(foldl, concat, "")
+
+
+Revision History and Acknowledgements
+=====================================
+
+The author would like to thank the following people for offering suggestions,
+corrections and assistance with various drafts of this article: Ian Bicking,
+Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro
+Lameiro, Jussi Salmela, Collin Winter, Blake Winton.
+
+Version 0.1: posted June 30 2006.
+
+Version 0.11: posted July 1 2006.  Typo fixes.
+
+Version 0.2: posted July 10 2006.  Merged genexp and listcomp sections into one.
+Typo fixes.
+
+Version 0.21: Added more references suggested on the tutor mailing list.
+
+Version 0.30: Adds a section on the ``functional`` module written by Collin
+Winter; adds short section on the operator module; a few other edits.
+
+
+References
+==========
+
+General
+-------
+
+**Structure and Interpretation of Computer Programs**, by Harold Abelson and
+Gerald Jay Sussman with Julie Sussman.  Full text at
+http://mitpress.mit.edu/sicp/.  In this classic textbook of computer science,
+chapters 2 and 3 discuss the use of sequences and streams to organize the data
+flow inside a program.  The book uses Scheme for its examples, but many of the
+design approaches described in these chapters are applicable to functional-style
+Python code.
+
+http://www.defmacro.org/ramblings/fp.html: A general introduction to functional
+programming that uses Java examples and has a lengthy historical introduction.
+
+http://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry
+describing functional programming.
+
+http://en.wikipedia.org/wiki/Coroutine: Entry for coroutines.
+
+http://en.wikipedia.org/wiki/Currying: Entry for the concept of currying.
+
+Python-specific
+---------------
+
+http://gnosis.cx/TPiP/: The first chapter of David Mertz's book
+:title-reference:`Text Processing in Python` discusses functional programming
+for text processing, in the section titled "Utilizing Higher-Order Functions in
+Text Processing".
+
+Mertz also wrote a 3-part series of articles on functional programming
+for IBM's DeveloperWorks site; see 
+`part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__,
+`part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and
+`part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__,
+
+
+Python documentation
+--------------------
+
+Documentation for the :mod:`itertools` module.
+
+Documentation for the :mod:`operator` module.
+
+:pep:`289`: "Generator Expressions"
+
+:pep:`342`: "Coroutines via Enhanced Generators" describes the new generator
+features in Python 2.5.
+
+.. comment
+
+    Topics to place
+    -----------------------------
+
+    XXX os.walk()
+
+    XXX Need a large example.
+
+    But will an example add much?  I'll post a first draft and see
+    what the comments say.
+
+.. comment
+
+    Original outline:
+    Introduction
+            Idea of FP
+                    Programs built out of functions
+                    Functions are strictly input-output, no internal state
+            Opposed to OO programming, where objects have state
+
+            Why FP?
+                    Formal provability
+                            Assignment is difficult to reason about
+                            Not very relevant to Python
+                    Modularity
+                            Small functions that do one thing
+                    Debuggability:
+                            Easy to test due to lack of state
+                            Easy to verify output from intermediate steps
+                    Composability
+                            You assemble a toolbox of functions that can be mixed
+
+    Tackling a problem
+            Need a significant example
+
+    Iterators
+    Generators
+    The itertools module
+    List comprehensions
+    Small functions and the lambda statement
+    Built-in functions
+            map
+            filter
+            reduce
+
+.. comment
+
+    Handy little function for printing part of an iterator -- used
+    while writing this document.
+
+    import itertools
+    def print_iter(it):
+         slice = itertools.islice(it, 10)
+         for elem in slice[:-1]:
+             sys.stdout.write(str(elem))
+             sys.stdout.write(', ')
+        print elem[-1]
+
+

Added: doctools/trunk/Doc-3k/howto/index.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/index.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,24 @@
+***************
+ Python HOWTOs
+***************
+
+Python HOWTOs are documents that cover a single, specific topic,
+and attempt to cover it fairly completely. Modelled on the Linux
+Documentation Project's HOWTO collection, this collection is an
+effort to foster documentation that's more detailed than the
+Python Library Reference.
+
+Currently, the HOWTOs are:
+
+.. toctree::
+   :maxdepth: 1
+
+   advocacy.rst
+   curses.rst
+   doanddont.rst
+   functional.rst
+   regex.rst
+   sockets.rst
+   unicode.rst
+   urllib2.rst
+

Added: doctools/trunk/Doc-3k/howto/regex.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/regex.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,1377 @@
+****************************
+  Regular Expression HOWTO  
+****************************
+
+:Author: A.M. Kuchling
+:Release: 0.05
+
+.. % TODO:
+.. % Document lookbehind assertions
+.. % Better way of displaying a RE, a string, and what it matches
+.. % Mention optional argument to match.groups()
+.. % Unicode (at least a reference)
+
+
+.. topic:: Abstract
+
+   This document is an introductory tutorial to using regular expressions in Python
+   with the :mod:`re` module.  It provides a gentler introduction than the
+   corresponding section in the Library Reference.
+
+
+Introduction
+============
+
+The :mod:`re` module was added in Python 1.5, and provides Perl-style regular
+expression patterns.  Earlier versions of Python came with the :mod:`regex`
+module, which provided Emacs-style patterns.  The :mod:`regex` module was
+removed completely in Python 2.5.
+
+Regular expressions (called REs, or regexes, or regex patterns) are essentially
+a tiny, highly specialized programming language embedded inside Python and made
+available through the :mod:`re` module. Using this little language, you specify
+the rules for the set of possible strings that you want to match; this set might
+contain English sentences, or e-mail addresses, or TeX commands, or anything you
+like.  You can then ask questions such as "Does this string match the pattern?",
+or "Is there a match for the pattern anywhere in this string?".  You can also
+use REs to modify a string or to split it apart in various ways.
+
+Regular expression patterns are compiled into a series of bytecodes which are
+then executed by a matching engine written in C.  For advanced use, it may be
+necessary to pay careful attention to how the engine will execute a given RE,
+and write the RE in a certain way in order to produce bytecode that runs faster.
+Optimization isn't covered in this document, because it requires that you have a
+good understanding of the matching engine's internals.
+
+The regular expression language is relatively small and restricted, so not all
+possible string processing tasks can be done using regular expressions.  There
+are also tasks that *can* be done with regular expressions, but the expressions
+turn out to be very complicated.  In these cases, you may be better off writing
+Python code to do the processing; while Python code will be slower than an
+elaborate regular expression, it will also probably be more understandable.
+
+
+Simple Patterns
+===============
+
+We'll start by learning about the simplest possible regular expressions.  Since
+regular expressions are used to operate on strings, we'll begin with the most
+common task: matching characters.
+
+For a detailed explanation of the computer science underlying regular
+expressions (deterministic and non-deterministic finite automata), you can refer
+to almost any textbook on writing compilers.
+
+
+Matching Characters
+-------------------
+
+Most letters and characters will simply match themselves.  For example, the
+regular expression ``test`` will match the string ``test`` exactly.  (You can
+enable a case-insensitive mode that would let this RE match ``Test`` or ``TEST``
+as well; more about this later.)
+
+There are exceptions to this rule; some characters are special
+:dfn:`metacharacters`, and don't match themselves.  Instead, they signal that
+some out-of-the-ordinary thing should be matched, or they affect other portions
+of the RE by repeating them or changing their meaning.  Much of this document is
+devoted to discussing various metacharacters and what they do.
+
+Here's a complete list of the metacharacters; their meanings will be discussed
+in the rest of this HOWTO. ::
+
+   . ^ $ * + ? { [ ] \ | ( )
+
+The first metacharacters we'll look at are ``[`` and ``]``. They're used for
+specifying a character class, which is a set of characters that you wish to
+match.  Characters can be listed individually, or a range of characters can be
+indicated by giving two characters and separating them by a ``'-'``.  For
+example, ``[abc]`` will match any of the characters ``a``, ``b``, or ``c``; this
+is the same as ``[a-c]``, which uses a range to express the same set of
+characters.  If you wanted to match only lowercase letters, your RE would be
+``[a-z]``.
+
+.. % $
+
+Metacharacters are not active inside classes.  For example, ``[akm$]`` will
+match any of the characters ``'a'``, ``'k'``, ``'m'``, or ``'$'``; ``'$'`` is
+usually a metacharacter, but inside a character class it's stripped of its
+special nature.
+
+You can match the characters not listed within the class by :dfn:`complementing`
+the set.  This is indicated by including a ``'^'`` as the first character of the
+class; ``'^'`` outside a character class will simply match the ``'^'``
+character.  For example, ``[^5]`` will match any character except ``'5'``.
+
+Perhaps the most important metacharacter is the backslash, ``\``.   As in Python
+string literals, the backslash can be followed by various characters to signal
+various special sequences.  It's also used to escape all the metacharacters so
+you can still match them in patterns; for example, if you need to match a ``[``
+or  ``\``, you can precede them with a backslash to remove their special
+meaning: ``\[`` or ``\\``.
+
+Some of the special sequences beginning with ``'\'`` represent predefined sets
+of characters that are often useful, such as the set of digits, the set of
+letters, or the set of anything that isn't whitespace.  The following predefined
+special sequences are available:
+
+``\d``
+   Matches any decimal digit; this is equivalent to the class ``[0-9]``.
+
+``\D``
+   Matches any non-digit character; this is equivalent to the class ``[^0-9]``.
+
+``\s``
+   Matches any whitespace character; this is equivalent to the class ``[
+   \t\n\r\f\v]``.
+
+``\S``
+   Matches any non-whitespace character; this is equivalent to the class ``[^
+   \t\n\r\f\v]``.
+
+``\w``
+   Matches any alphanumeric character; this is equivalent to the class
+   ``[a-zA-Z0-9_]``.
+
+``\W``
+   Matches any non-alphanumeric character; this is equivalent to the class
+   ``[^a-zA-Z0-9_]``.
+
+These sequences can be included inside a character class.  For example,
+``[\s,.]`` is a character class that will match any whitespace character, or
+``','`` or ``'.'``.
+
+The final metacharacter in this section is ``.``.  It matches anything except a
+newline character, and there's an alternate mode (``re.DOTALL``) where it will
+match even a newline.  ``'.'`` is often used where you want to match "any
+character".
+
+
+Repeating Things
+----------------
+
+Being able to match varying sets of characters is the first thing regular
+expressions can do that isn't already possible with the methods available on
+strings.  However, if that was the only additional capability of regexes, they
+wouldn't be much of an advance. Another capability is that you can specify that
+portions of the RE must be repeated a certain number of times.
+
+The first metacharacter for repeating things that we'll look at is ``*``.  ``*``
+doesn't match the literal character ``*``; instead, it specifies that the
+previous character can be matched zero or more times, instead of exactly once.
+
+For example, ``ca*t`` will match ``ct`` (0 ``a`` characters), ``cat`` (1 ``a``),
+``caaat`` (3 ``a`` characters), and so forth.  The RE engine has various
+internal limitations stemming from the size of C's ``int`` type that will
+prevent it from matching over 2 billion ``a`` characters; you probably don't
+have enough memory to construct a string that large, so you shouldn't run into
+that limit.
+
+Repetitions such as ``*`` are :dfn:`greedy`; when repeating a RE, the matching
+engine will try to repeat it as many times as possible. If later portions of the
+pattern don't match, the matching engine will then back up and try again with
+few repetitions.
+
+A step-by-step example will make this more obvious.  Let's consider the
+expression ``a[bcd]*b``.  This matches the letter ``'a'``, zero or more letters
+from the class ``[bcd]``, and finally ends with a ``'b'``.  Now imagine matching
+this RE against the string ``abcbd``.
+
++------+-----------+---------------------------------+
+| Step | Matched   | Explanation                     |
++======+===========+=================================+
+| 1    | ``a``     | The ``a`` in the RE matches.    |
++------+-----------+---------------------------------+
+| 2    | ``abcbd`` | The engine matches ``[bcd]*``,  |
+|      |           | going as far as it can, which   |
+|      |           | is to the end of the string.    |
++------+-----------+---------------------------------+
+| 3    | *Failure* | The engine tries to match       |
+|      |           | ``b``, but the current position |
+|      |           | is at the end of the string, so |
+|      |           | it fails.                       |
++------+-----------+---------------------------------+
+| 4    | ``abcb``  | Back up, so that  ``[bcd]*``    |
+|      |           | matches one less character.     |
++------+-----------+---------------------------------+
+| 5    | *Failure* | Try ``b`` again, but the        |
+|      |           | current position is at the last |
+|      |           | character, which is a ``'d'``.  |
++------+-----------+---------------------------------+
+| 6    | ``abc``   | Back up again, so that          |
+|      |           | ``[bcd]*`` is only matching     |
+|      |           | ``bc``.                         |
++------+-----------+---------------------------------+
+| 6    | ``abcb``  | Try ``b`` again.  This time     |
+|      |           | but the character at the        |
+|      |           | current position is ``'b'``, so |
+|      |           | it succeeds.                    |
++------+-----------+---------------------------------+
+
+The end of the RE has now been reached, and it has matched ``abcb``.  This
+demonstrates how the matching engine goes as far as it can at first, and if no
+match is found it will then progressively back up and retry the rest of the RE
+again and again.  It will back up until it has tried zero matches for
+``[bcd]*``, and if that subsequently fails, the engine will conclude that the
+string doesn't match the RE at all.
+
+Another repeating metacharacter is ``+``, which matches one or more times.  Pay
+careful attention to the difference between ``*`` and ``+``; ``*`` matches
+*zero* or more times, so whatever's being repeated may not be present at all,
+while ``+`` requires at least *one* occurrence.  To use a similar example,
+``ca+t`` will match ``cat`` (1 ``a``), ``caaat`` (3 ``a``'s), but won't match
+``ct``.
+
+There are two more repeating qualifiers.  The question mark character, ``?``,
+matches either once or zero times; you can think of it as marking something as
+being optional.  For example, ``home-?brew`` matches either ``homebrew`` or
+``home-brew``.
+
+The most complicated repeated qualifier is ``{m,n}``, where *m* and *n* are
+decimal integers.  This qualifier means there must be at least *m* repetitions,
+and at most *n*.  For example, ``a/{1,3}b`` will match ``a/b``, ``a//b``, and
+``a///b``.  It won't match ``ab``, which has no slashes, or ``a////b``, which
+has four.
+
+You can omit either *m* or *n*; in that case, a reasonable value is assumed for
+the missing value.  Omitting *m* is interpreted as a lower limit of 0, while
+omitting *n* results in an upper bound of infinity --- actually, the upper bound
+is the 2-billion limit mentioned earlier, but that might as well be infinity.
+
+Readers of a reductionist bent may notice that the three other qualifiers can
+all be expressed using this notation.  ``{0,}`` is the same as ``*``, ``{1,}``
+is equivalent to ``+``, and ``{0,1}`` is the same as ``?``.  It's better to use
+``*``, ``+``, or ``?`` when you can, simply because they're shorter and easier
+to read.
+
+
+Using Regular Expressions
+=========================
+
+Now that we've looked at some simple regular expressions, how do we actually use
+them in Python?  The :mod:`re` module provides an interface to the regular
+expression engine, allowing you to compile REs into objects and then perform
+matches with them.
+
+
+Compiling Regular Expressions
+-----------------------------
+
+Regular expressions are compiled into :class:`RegexObject` instances, which have
+methods for various operations such as searching for pattern matches or
+performing string substitutions. ::
+
+   >>> import re
+   >>> p = re.compile('ab*')
+   >>> print p
+   <re.RegexObject instance at 80b4150>
+
+:func:`re.compile` also accepts an optional *flags* argument, used to enable
+various special features and syntax variations.  We'll go over the available
+settings later, but for now a single example will do::
+
+   >>> p = re.compile('ab*', re.IGNORECASE)
+
+The RE is passed to :func:`re.compile` as a string.  REs are handled as strings
+because regular expressions aren't part of the core Python language, and no
+special syntax was created for expressing them.  (There are applications that
+don't need REs at all, so there's no need to bloat the language specification by
+including them.) Instead, the :mod:`re` module is simply a C extension module
+included with Python, just like the :mod:`socket` or :mod:`zlib` modules.
+
+Putting REs in strings keeps the Python language simpler, but has one
+disadvantage which is the topic of the next section.
+
+
+The Backslash Plague
+--------------------
+
+As stated earlier, regular expressions use the backslash character (``'\'``) to
+indicate special forms or to allow special characters to be used without
+invoking their special meaning. This conflicts with Python's usage of the same
+character for the same purpose in string literals.
+
+Let's say you want to write a RE that matches the string ``\section``, which
+might be found in a LaTeX file.  To figure out what to write in the program
+code, start with the desired string to be matched.  Next, you must escape any
+backslashes and other metacharacters by preceding them with a backslash,
+resulting in the string ``\\section``.  The resulting string that must be passed
+to :func:`re.compile` must be ``\\section``.  However, to express this as a
+Python string literal, both backslashes must be escaped *again*.
+
++-------------------+------------------------------------------+
+| Characters        | Stage                                    |
++===================+==========================================+
+| ``\section``      | Text string to be matched                |
++-------------------+------------------------------------------+
+| ``\\section``     | Escaped backslash for :func:`re.compile` |
++-------------------+------------------------------------------+
+| ``"\\\\section"`` | Escaped backslashes for a string literal |
++-------------------+------------------------------------------+
+
+In short, to match a literal backslash, one has to write ``'\\\\'`` as the RE
+string, because the regular expression must be ``\\``, and each backslash must
+be expressed as ``\\`` inside a regular Python string literal.  In REs that
+feature backslashes repeatedly, this leads to lots of repeated backslashes and
+makes the resulting strings difficult to understand.
+
+The solution is to use Python's raw string notation for regular expressions;
+backslashes are not handled in any special way in a string literal prefixed with
+``'r'``, so ``r"\n"`` is a two-character string containing ``'\'`` and ``'n'``,
+while ``"\n"`` is a one-character string containing a newline. Regular
+expressions will often be written in Python code using this raw string notation.
+
++-------------------+------------------+
+| Regular String    | Raw string       |
++===================+==================+
+| ``"ab*"``         | ``r"ab*"``       |
++-------------------+------------------+
+| ``"\\\\section"`` | ``r"\\section"`` |
++-------------------+------------------+
+| ``"\\w+\\s+\\1"`` | ``r"\w+\s+\1"``  |
++-------------------+------------------+
+
+
+Performing Matches
+------------------
+
+Once you have an object representing a compiled regular expression, what do you
+do with it?  :class:`RegexObject` instances have several methods and attributes.
+Only the most significant ones will be covered here; consult `the Library
+Reference <http://www.python.org/doc/lib/module-re.html>`_ for a complete
+listing.
+
++------------------+-----------------------------------------------+
+| Method/Attribute | Purpose                                       |
++==================+===============================================+
+| ``match()``      | Determine if the RE matches at the beginning  |
+|                  | of the string.                                |
++------------------+-----------------------------------------------+
+| ``search()``     | Scan through a string, looking for any        |
+|                  | location where this RE matches.               |
++------------------+-----------------------------------------------+
+| ``findall()``    | Find all substrings where the RE matches, and |
+|                  | returns them as a list.                       |
++------------------+-----------------------------------------------+
+| ``finditer()``   | Find all substrings where the RE matches, and |
+|                  | returns them as an iterator.                  |
++------------------+-----------------------------------------------+
+
+:meth:`match` and :meth:`search` return ``None`` if no match can be found.  If
+they're successful, a ``MatchObject`` instance is returned, containing
+information about the match: where it starts and ends, the substring it matched,
+and more.
+
+You can learn about this by interactively experimenting with the :mod:`re`
+module.  If you have Tkinter available, you may also want to look at
+:file:`Tools/scripts/redemo.py`, a demonstration program included with the
+Python distribution.  It allows you to enter REs and strings, and displays
+whether the RE matches or fails. :file:`redemo.py` can be quite useful when
+trying to debug a complicated RE.  Phil Schwartz's `Kodos <http://www.phil-
+schwartz.com/kodos.spy>`_ is also an interactive tool for developing and testing
+RE patterns.
+
+This HOWTO uses the standard Python interpreter for its examples. First, run the
+Python interpreter, import the :mod:`re` module, and compile a RE::
+
+   Python 2.2.2 (#1, Feb 10 2003, 12:57:01)
+   >>> import re
+   >>> p = re.compile('[a-z]+')
+   >>> p
+   <_sre.SRE_Pattern object at 80c3c28>
+
+Now, you can try matching various strings against the RE ``[a-z]+``.  An empty
+string shouldn't match at all, since ``+`` means 'one or more repetitions'.
+:meth:`match` should return ``None`` in this case, which will cause the
+interpreter to print no output.  You can explicitly print the result of
+:meth:`match` to make this clear. ::
+
+   >>> p.match("")
+   >>> print p.match("")
+   None
+
+Now, let's try it on a string that it should match, such as ``tempo``.  In this
+case, :meth:`match` will return a :class:`MatchObject`, so you should store the
+result in a variable for later use. ::
+
+   >>> m = p.match('tempo')
+   >>> print m
+   <_sre.SRE_Match object at 80c4f68>
+
+Now you can query the :class:`MatchObject` for information about the matching
+string.   :class:`MatchObject` instances also have several methods and
+attributes; the most important ones are:
+
++------------------+--------------------------------------------+
+| Method/Attribute | Purpose                                    |
++==================+============================================+
+| ``group()``      | Return the string matched by the RE        |
++------------------+--------------------------------------------+
+| ``start()``      | Return the starting position of the match  |
++------------------+--------------------------------------------+
+| ``end()``        | Return the ending position of the match    |
++------------------+--------------------------------------------+
+| ``span()``       | Return a tuple containing the (start, end) |
+|                  | positions  of the match                    |
++------------------+--------------------------------------------+
+
+Trying these methods will soon clarify their meaning::
+
+   >>> m.group()
+   'tempo'
+   >>> m.start(), m.end()
+   (0, 5)
+   >>> m.span()
+   (0, 5)
+
+:meth:`group` returns the substring that was matched by the RE.  :meth:`start`
+and :meth:`end` return the starting and ending index of the match. :meth:`span`
+returns both start and end indexes in a single tuple.  Since the :meth:`match`
+method only checks if the RE matches at the start of a string, :meth:`start`
+will always be zero.  However, the :meth:`search` method of :class:`RegexObject`
+instances scans through the string, so  the match may not start at zero in that
+case. ::
+
+   >>> print p.match('::: message')
+   None
+   >>> m = p.search('::: message') ; print m
+   <re.MatchObject instance at 80c9650>
+   >>> m.group()
+   'message'
+   >>> m.span()
+   (4, 11)
+
+In actual programs, the most common style is to store the :class:`MatchObject`
+in a variable, and then check if it was ``None``.  This usually looks like::
+
+   p = re.compile( ... )
+   m = p.match( 'string goes here' )
+   if m:
+       print 'Match found: ', m.group()
+   else:
+       print 'No match'
+
+Two :class:`RegexObject` methods return all of the matches for a pattern.
+:meth:`findall` returns a list of matching strings::
+
+   >>> p = re.compile('\d+')
+   >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
+   ['12', '11', '10']
+
+:meth:`findall` has to create the entire list before it can be returned as the
+result.  The :meth:`finditer` method returns a sequence of :class:`MatchObject`
+instances as an iterator. [#]_ ::
+
+   >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
+   >>> iterator
+   <callable-iterator object at 0x401833ac>
+   >>> for match in iterator:
+   ...     print match.span()
+   ...
+   (0, 2)
+   (22, 24)
+   (29, 31)
+
+
+Module-Level Functions
+----------------------
+
+You don't have to create a :class:`RegexObject` and call its methods; the
+:mod:`re` module also provides top-level functions called :func:`match`,
+:func:`search`, :func:`findall`, :func:`sub`, and so forth.  These functions
+take the same arguments as the corresponding :class:`RegexObject` method, with
+the RE string added as the first argument, and still return either ``None`` or a
+:class:`MatchObject` instance. ::
+
+   >>> print re.match(r'From\s+', 'Fromage amk')
+   None
+   >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
+   <re.MatchObject instance at 80c5978>
+
+Under the hood, these functions simply produce a :class:`RegexObject` for you
+and call the appropriate method on it.  They also store the compiled object in a
+cache, so future calls using the same RE are faster.
+
+Should you use these module-level functions, or should you get the
+:class:`RegexObject` and call its methods yourself?  That choice depends on how
+frequently the RE will be used, and on your personal coding style.  If the RE is
+being used at only one point in the code, then the module functions are probably
+more convenient.  If a program contains a lot of regular expressions, or re-uses
+the same ones in several locations, then it might be worthwhile to collect all
+the definitions in one place, in a section of code that compiles all the REs
+ahead of time.  To take an example from the standard library, here's an extract
+from :file:`xmllib.py`::
+
+   ref = re.compile( ... )
+   entityref = re.compile( ... )
+   charref = re.compile( ... )
+   starttagopen = re.compile( ... )
+
+I generally prefer to work with the compiled object, even for one-time uses, but
+few people will be as much of a purist about this as I am.
+
+
+Compilation Flags
+-----------------
+
+Compilation flags let you modify some aspects of how regular expressions work.
+Flags are available in the :mod:`re` module under two names, a long name such as
+:const:`IGNORECASE` and a short, one-letter form such as :const:`I`.  (If you're
+familiar with Perl's pattern modifiers, the one-letter forms use the same
+letters; the short form of :const:`re.VERBOSE` is :const:`re.X`, for example.)
+Multiple flags can be specified by bitwise OR-ing them; ``re.I | re.M`` sets
+both the :const:`I` and :const:`M` flags, for example.
+
+Here's a table of the available flags, followed by a more detailed explanation
+of each one.
+
++---------------------------------+--------------------------------------------+
+| Flag                            | Meaning                                    |
++=================================+============================================+
+| :const:`DOTALL`, :const:`S`     | Make ``.`` match any character, including  |
+|                                 | newlines                                   |
++---------------------------------+--------------------------------------------+
+| :const:`IGNORECASE`, :const:`I` | Do case-insensitive matches                |
++---------------------------------+--------------------------------------------+
+| :const:`LOCALE`, :const:`L`     | Do a locale-aware match                    |
++---------------------------------+--------------------------------------------+
+| :const:`MULTILINE`, :const:`M`  | Multi-line matching, affecting ``^`` and   |
+|                                 | ``$``                                      |
++---------------------------------+--------------------------------------------+
+| :const:`VERBOSE`, :const:`X`    | Enable verbose REs, which can be organized |
+|                                 | more cleanly and understandably.           |
++---------------------------------+--------------------------------------------+
+
+
+.. data:: I
+          IGNORECASE
+   :noindex:
+
+   Perform case-insensitive matching; character class and literal strings will
+   match letters by ignoring case.  For example, ``[A-Z]`` will match lowercase
+   letters, too, and ``Spam`` will match ``Spam``, ``spam``, or ``spAM``. This
+   lowercasing doesn't take the current locale into account; it will if you also
+   set the :const:`LOCALE` flag.
+
+
+.. data:: L
+          LOCALE
+   :noindex:
+
+   Make ``\w``, ``\W``, ``\b``, and ``\B``, dependent on the current locale.
+
+   Locales are a feature of the C library intended to help in writing programs that
+   take account of language differences.  For example, if you're processing French
+   text, you'd want to be able to write ``\w+`` to match words, but ``\w`` only
+   matches the character class ``[A-Za-z]``; it won't match ``'é'`` or ``'ç'``.  If
+   your system is configured properly and a French locale is selected, certain C
+   functions will tell the program that ``'é'`` should also be considered a letter.
+   Setting the :const:`LOCALE` flag when compiling a regular expression will cause
+   the resulting compiled object to use these C functions for ``\w``; this is
+   slower, but also enables ``\w+`` to match French words as you'd expect.
+
+
+.. data:: M
+          MULTILINE
+   :noindex:
+
+   (``^`` and ``$`` haven't been explained yet;  they'll be introduced in section
+   :ref:`more-metacharacters`.)
+
+   Usually ``^`` matches only at the beginning of the string, and ``$`` matches
+   only at the end of the string and immediately before the newline (if any) at the
+   end of the string. When this flag is specified, ``^`` matches at the beginning
+   of the string and at the beginning of each line within the string, immediately
+   following each newline.  Similarly, the ``$`` metacharacter matches either at
+   the end of the string and at the end of each line (immediately preceding each
+   newline).
+
+
+.. data:: S
+          DOTALL
+   :noindex:
+
+   Makes the ``'.'`` special character match any character at all, including a
+   newline; without this flag, ``'.'`` will match anything *except* a newline.
+
+
+.. data:: X
+          VERBOSE
+   :noindex:
+
+   This flag allows you to write regular expressions that are more readable by
+   granting you more flexibility in how you can format them.  When this flag has
+   been specified, whitespace within the RE string is ignored, except when the
+   whitespace is in a character class or preceded by an unescaped backslash; this
+   lets you organize and indent the RE more clearly.  This flag also lets you put
+   comments within a RE that will be ignored by the engine; comments are marked by
+   a ``'#'`` that's neither in a character class or preceded by an unescaped
+   backslash.
+
+   For example, here's a RE that uses :const:`re.VERBOSE`; see how much easier it
+   is to read? ::
+
+      charref = re.compile(r"""
+       &[#]		     # Start of a numeric entity reference
+       (
+           0[0-7]+         # Octal form
+         | [0-9]+          # Decimal form
+         | x[0-9a-fA-F]+   # Hexadecimal form
+       )
+       ;                   # Trailing semicolon
+      """, re.VERBOSE)
+
+   Without the verbose setting, the RE would look like this::
+
+      charref = re.compile("&#(0[0-7]+"
+                           "|[0-9]+"
+                           "|x[0-9a-fA-F]+);")
+
+   In the above example, Python's automatic concatenation of string literals has
+   been used to break up the RE into smaller pieces, but it's still more difficult
+   to understand than the version using :const:`re.VERBOSE`.
+
+
+More Pattern Power
+==================
+
+So far we've only covered a part of the features of regular expressions.  In
+this section, we'll cover some new metacharacters, and how to use groups to
+retrieve portions of the text that was matched.
+
+
+.. _more-metacharacters:
+
+More Metacharacters
+-------------------
+
+There are some metacharacters that we haven't covered yet.  Most of them will be
+covered in this section.
+
+Some of the remaining metacharacters to be discussed are :dfn:`zero-width
+assertions`.  They don't cause the engine to advance through the string;
+instead, they consume no characters at all, and simply succeed or fail.  For
+example, ``\b`` is an assertion that the current position is located at a word
+boundary; the position isn't changed by the ``\b`` at all.  This means that
+zero-width assertions should never be repeated, because if they match once at a
+given location, they can obviously be matched an infinite number of times.
+
+``|``
+   Alternation, or the "or" operator.   If A and B are regular expressions,
+   ``A|B`` will match any string that matches either ``A`` or ``B``. ``|`` has very
+   low precedence in order to make it work reasonably when you're alternating
+   multi-character strings. ``Crow|Servo`` will match either ``Crow`` or ``Servo``,
+   not ``Cro``, a ``'w'`` or an ``'S'``, and ``ervo``.
+
+   To match a literal ``'|'``, use ``\|``, or enclose it inside a character class,
+   as in ``[|]``.
+
+``^``
+   Matches at the beginning of lines.  Unless the :const:`MULTILINE` flag has been
+   set, this will only match at the beginning of the string.  In :const:`MULTILINE`
+   mode, this also matches immediately after each newline within the string.
+
+   For example, if you wish to match the word ``From`` only at the beginning of a
+   line, the RE to use is ``^From``. ::
+
+      >>> print re.search('^From', 'From Here to Eternity')
+      <re.MatchObject instance at 80c1520>
+      >>> print re.search('^From', 'Reciting From Memory')
+      None
+
+   .. % To match a literal \character{\^}, use \regexp{\e\^} or enclose it
+   .. % inside a character class, as in \regexp{[{\e}\^]}.
+
+``$``
+   Matches at the end of a line, which is defined as either the end of the string,
+   or any location followed by a newline character.     ::
+
+      >>> print re.search('}$', '{block}')
+      <re.MatchObject instance at 80adfa8>
+      >>> print re.search('}$', '{block} ')
+      None
+      >>> print re.search('}$', '{block}\n')
+      <re.MatchObject instance at 80adfa8>
+
+   To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
+   as in  ``[$]``.
+
+   .. % $
+
+``\A``
+   Matches only at the start of the string.  When not in :const:`MULTILINE` mode,
+   ``\A`` and ``^`` are effectively the same.  In :const:`MULTILINE` mode, they're
+   different: ``\A`` still matches only at the beginning of the string, but ``^``
+   may match at any location inside the string that follows a newline character.
+
+``\Z``
+   Matches only at the end of the string.
+
+``\b``
+   Word boundary.   This is a zero-width assertion that matches only at the
+   beginning or end of a word.  A word is defined as a sequence of alphanumeric
+   characters, so the end of a word is indicated by whitespace or a non-
+   alphanumeric character.
+
+   The following example matches ``class`` only when it's a complete word; it won't
+   match when it's contained inside another word. ::
+
+      >>> p = re.compile(r'\bclass\b')
+      >>> print p.search('no class at all')
+      <re.MatchObject instance at 80c8f28>
+      >>> print p.search('the declassified algorithm')
+      None
+      >>> print p.search('one subclass is')
+      None
+
+   There are two subtleties you should remember when using this special sequence.
+   First, this is the worst collision between Python's string literals and regular
+   expression sequences.  In Python's string literals, ``\b`` is the backspace
+   character, ASCII value 8.  If you're not using raw strings, then Python will
+   convert the ``\b`` to a backspace, and your RE won't match as you expect it to.
+   The following example looks the same as our previous RE, but omits the ``'r'``
+   in front of the RE string. ::
+
+      >>> p = re.compile('\bclass\b')
+      >>> print p.search('no class at all')
+      None
+      >>> print p.search('\b' + 'class' + '\b')  
+      <re.MatchObject instance at 80c3ee0>
+
+   Second, inside a character class, where there's no use for this assertion,
+   ``\b`` represents the backspace character, for compatibility with Python's
+   string literals.
+
+``\B``
+   Another zero-width assertion, this is the opposite of ``\b``, only matching when
+   the current position is not at a word boundary.
+
+
+Grouping
+--------
+
+Frequently you need to obtain more information than just whether the RE matched
+or not.  Regular expressions are often used to dissect strings by writing a RE
+divided into several subgroups which match different components of interest.
+For example, an RFC-822 header line is divided into a header name and a value,
+separated by a ``':'``, like this::
+
+   From: author at example.com
+   User-Agent: Thunderbird 1.5.0.9 (X11/20061227)
+   MIME-Version: 1.0
+   To: editor at example.com
+
+This can be handled by writing a regular expression which matches an entire
+header line, and has one group which matches the header name, and another group
+which matches the header's value.
+
+Groups are marked by the ``'('``, ``')'`` metacharacters. ``'('`` and ``')'``
+have much the same meaning as they do in mathematical expressions; they group
+together the expressions contained inside them, and you can repeat the contents
+of a group with a repeating qualifier, such as ``*``, ``+``, ``?``, or
+``{m,n}``.  For example, ``(ab)*`` will match zero or more repetitions of
+``ab``. ::
+
+   >>> p = re.compile('(ab)*')
+   >>> print p.match('ababababab').span()
+   (0, 10)
+
+Groups indicated with ``'('``, ``')'`` also capture the starting and ending
+index of the text that they match; this can be retrieved by passing an argument
+to :meth:`group`, :meth:`start`, :meth:`end`, and :meth:`span`.  Groups are
+numbered starting with 0.  Group 0 is always present; it's the whole RE, so
+:class:`MatchObject` methods all have group 0 as their default argument.  Later
+we'll see how to express groups that don't capture the span of text that they
+match. ::
+
+   >>> p = re.compile('(a)b')
+   >>> m = p.match('ab')
+   >>> m.group()
+   'ab'
+   >>> m.group(0)
+   'ab'
+
+Subgroups are numbered from left to right, from 1 upward.  Groups can be nested;
+to determine the number, just count the opening parenthesis characters, going
+from left to right. ::
+
+   >>> p = re.compile('(a(b)c)d')
+   >>> m = p.match('abcd')
+   >>> m.group(0)
+   'abcd'
+   >>> m.group(1)
+   'abc'
+   >>> m.group(2)
+   'b'
+
+:meth:`group` can be passed multiple group numbers at a time, in which case it
+will return a tuple containing the corresponding values for those groups. ::
+
+   >>> m.group(2,1,2)
+   ('b', 'abc', 'b')
+
+The :meth:`groups` method returns a tuple containing the strings for all the
+subgroups, from 1 up to however many there are. ::
+
+   >>> m.groups()
+   ('abc', 'b')
+
+Backreferences in a pattern allow you to specify that the contents of an earlier
+capturing group must also be found at the current location in the string.  For
+example, ``\1`` will succeed if the exact contents of group 1 can be found at
+the current position, and fails otherwise.  Remember that Python's string
+literals also use a backslash followed by numbers to allow including arbitrary
+characters in a string, so be sure to use a raw string when incorporating
+backreferences in a RE.
+
+For example, the following RE detects doubled words in a string. ::
+
+   >>> p = re.compile(r'(\b\w+)\s+\1')
+   >>> p.search('Paris in the the spring').group()
+   'the the'
+
+Backreferences like this aren't often useful for just searching through a string
+--- there are few text formats which repeat data in this way --- but you'll soon
+find out that they're *very* useful when performing string substitutions.
+
+
+Non-capturing and Named Groups
+------------------------------
+
+Elaborate REs may use many groups, both to capture substrings of interest, and
+to group and structure the RE itself.  In complex REs, it becomes difficult to
+keep track of the group numbers.  There are two features which help with this
+problem.  Both of them use a common syntax for regular expression extensions, so
+we'll look at that first.
+
+Perl 5 added several additional features to standard regular expressions, and
+the Python :mod:`re` module supports most of them.   It would have been
+difficult to choose new single-keystroke metacharacters or new special sequences
+beginning with ``\`` to represent the new features without making Perl's regular
+expressions confusingly different from standard REs.  If you chose ``&`` as a
+new metacharacter, for example, old expressions would be assuming that ``&`` was
+a regular character and wouldn't have escaped it by writing ``\&`` or ``[&]``.
+
+The solution chosen by the Perl developers was to use ``(?...)`` as the
+extension syntax.  ``?`` immediately after a parenthesis was a syntax error
+because the ``?`` would have nothing to repeat, so this didn't introduce any
+compatibility problems.  The characters immediately after the ``?``  indicate
+what extension is being used, so ``(?=foo)`` is one thing (a positive lookahead
+assertion) and ``(?:foo)`` is something else (a non-capturing group containing
+the subexpression ``foo``).
+
+Python adds an extension syntax to Perl's extension syntax.  If the first
+character after the question mark is a ``P``, you know that it's an extension
+that's specific to Python.  Currently there are two such extensions:
+``(?P<name>...)`` defines a named group, and ``(?P=name)`` is a backreference to
+a named group.  If future versions of Perl 5 add similar features using a
+different syntax, the :mod:`re` module will be changed to support the new
+syntax, while preserving the Python-specific syntax for compatibility's sake.
+
+Now that we've looked at the general extension syntax, we can return to the
+features that simplify working with groups in complex REs. Since groups are
+numbered from left to right and a complex expression may use many groups, it can
+become difficult to keep track of the correct numbering.  Modifying such a
+complex RE is annoying, too: insert a new group near the beginning and you
+change the numbers of everything that follows it.
+
+Sometimes you'll want to use a group to collect a part of a regular expression,
+but aren't interested in retrieving the group's contents. You can make this fact
+explicit by using a non-capturing group: ``(?:...)``, where you can replace the
+``...`` with any other regular expression. ::
+
+   >>> m = re.match("([abc])+", "abc")
+   >>> m.groups()
+   ('c',)
+   >>> m = re.match("(?:[abc])+", "abc")
+   >>> m.groups()
+   ()
+
+Except for the fact that you can't retrieve the contents of what the group
+matched, a non-capturing group behaves exactly the same as a capturing group;
+you can put anything inside it, repeat it with a repetition metacharacter such
+as ``*``, and nest it within other groups (capturing or non-capturing).
+``(?:...)`` is particularly useful when modifying an existing pattern, since you
+can add new groups without changing how all the other groups are numbered.  It
+should be mentioned that there's no performance difference in searching between
+capturing and non-capturing groups; neither form is any faster than the other.
+
+A more significant feature is named groups: instead of referring to them by
+numbers, groups can be referenced by a name.
+
+The syntax for a named group is one of the Python-specific extensions:
+``(?P<name>...)``.  *name* is, obviously, the name of the group.  Named groups
+also behave exactly like capturing groups, and additionally associate a name
+with a group.  The :class:`MatchObject` methods that deal with capturing groups
+all accept either integers that refer to the group by number or strings that
+contain the desired group's name.  Named groups are still given numbers, so you
+can retrieve information about a group in two ways::
+
+   >>> p = re.compile(r'(?P<word>\b\w+\b)')
+   >>> m = p.search( '(((( Lots of punctuation )))' )
+   >>> m.group('word')
+   'Lots'
+   >>> m.group(1)
+   'Lots'
+
+Named groups are handy because they let you use easily-remembered names, instead
+of having to remember numbers.  Here's an example RE from the :mod:`imaplib`
+module::
+
+   InternalDate = re.compile(r'INTERNALDATE "'
+           r'(?P<day>[ 123][0-9])-(?P<mon>[A-Z][a-z][a-z])-'
+   	r'(?P<year>[0-9][0-9][0-9][0-9])'
+           r' (?P<hour>[0-9][0-9]):(?P<min>[0-9][0-9]):(?P<sec>[0-9][0-9])'
+           r' (?P<zonen>[-+])(?P<zoneh>[0-9][0-9])(?P<zonem>[0-9][0-9])'
+           r'"')
+
+It's obviously much easier to retrieve ``m.group('zonem')``, instead of having
+to remember to retrieve group 9.
+
+The syntax for backreferences in an expression such as ``(...)\1`` refers to the
+number of the group.  There's naturally a variant that uses the group name
+instead of the number. This is another Python extension: ``(?P=name)`` indicates
+that the contents of the group called *name* should again be matched at the
+current point.  The regular expression for finding doubled words,
+``(\b\w+)\s+\1`` can also be written as ``(?P<word>\b\w+)\s+(?P=word)``::
+
+   >>> p = re.compile(r'(?P<word>\b\w+)\s+(?P=word)')
+   >>> p.search('Paris in the the spring').group()
+   'the the'
+
+
+Lookahead Assertions
+--------------------
+
+Another zero-width assertion is the lookahead assertion.  Lookahead assertions
+are available in both positive and negative form, and  look like this:
+
+``(?=...)``
+   Positive lookahead assertion.  This succeeds if the contained regular
+   expression, represented here by ``...``, successfully matches at the current
+   location, and fails otherwise. But, once the contained expression has been
+   tried, the matching engine doesn't advance at all; the rest of the pattern is
+   tried right where the assertion started.
+
+``(?!...)``
+   Negative lookahead assertion.  This is the opposite of the positive assertion;
+   it succeeds if the contained expression *doesn't* match at the current position
+   in the string.
+
+To make this concrete, let's look at a case where a lookahead is useful.
+Consider a simple pattern to match a filename and split it apart into a base
+name and an extension, separated by a ``.``.  For example, in ``news.rc``,
+``news`` is the base name, and ``rc`` is the filename's extension.
+
+The pattern to match this is quite simple:
+
+``.*[.].*$``
+
+Notice that the ``.`` needs to be treated specially because it's a
+metacharacter; I've put it inside a character class.  Also notice the trailing
+``$``; this is added to ensure that all the rest of the string must be included
+in the extension.  This regular expression matches ``foo.bar`` and
+``autoexec.bat`` and ``sendmail.cf`` and ``printers.conf``.
+
+Now, consider complicating the problem a bit; what if you want to match
+filenames where the extension is not ``bat``? Some incorrect attempts:
+
+``.*[.][^b].*$``  The first attempt above tries to exclude ``bat`` by requiring
+that the first character of the extension is not a ``b``.  This is wrong,
+because the pattern also doesn't match ``foo.bar``.
+
+.. % $
+
+``.*[.]([^b]..|.[^a].|..[^t])$``
+
+.. % Messes up the HTML without the curly braces around \^
+
+The expression gets messier when you try to patch up the first solution by
+requiring one of the following cases to match: the first character of the
+extension isn't ``b``; the second character isn't ``a``; or the third character
+isn't ``t``.  This accepts ``foo.bar`` and rejects ``autoexec.bat``, but it
+requires a three-letter extension and won't accept a filename with a two-letter
+extension such as ``sendmail.cf``.  We'll complicate the pattern again in an
+effort to fix it.
+
+``.*[.]([^b].?.?|.[^a]?.?|..?[^t]?)$``
+
+In the third attempt, the second and third letters are all made optional in
+order to allow matching extensions shorter than three characters, such as
+``sendmail.cf``.
+
+The pattern's getting really complicated now, which makes it hard to read and
+understand.  Worse, if the problem changes and you want to exclude both ``bat``
+and ``exe`` as extensions, the pattern would get even more complicated and
+confusing.
+
+A negative lookahead cuts through all this confusion:
+
+``.*[.](?!bat$).*$``  The negative lookahead means: if the expression ``bat``
+doesn't match at this point, try the rest of the pattern; if ``bat$`` does
+match, the whole pattern will fail.  The trailing ``$`` is required to ensure
+that something like ``sample.batch``, where the extension only starts with
+``bat``, will be allowed.
+
+.. % $
+
+Excluding another filename extension is now easy; simply add it as an
+alternative inside the assertion.  The following pattern excludes filenames that
+end in either ``bat`` or ``exe``:
+
+``.*[.](?!bat$|exe$).*$``
+
+.. % $
+
+
+Modifying Strings
+=================
+
+Up to this point, we've simply performed searches against a static string.
+Regular expressions are also commonly used to modify strings in various ways,
+using the following :class:`RegexObject` methods:
+
++------------------+-----------------------------------------------+
+| Method/Attribute | Purpose                                       |
++==================+===============================================+
+| ``split()``      | Split the string into a list, splitting it    |
+|                  | wherever the RE matches                       |
++------------------+-----------------------------------------------+
+| ``sub()``        | Find all substrings where the RE matches, and |
+|                  | replace them with a different string          |
++------------------+-----------------------------------------------+
+| ``subn()``       | Does the same thing as :meth:`sub`,  but      |
+|                  | returns the new string and the number of      |
+|                  | replacements                                  |
++------------------+-----------------------------------------------+
+
+
+Splitting Strings
+-----------------
+
+The :meth:`split` method of a :class:`RegexObject` splits a string apart
+wherever the RE matches, returning a list of the pieces. It's similar to the
+:meth:`split` method of strings but provides much more generality in the
+delimiters that you can split by; :meth:`split` only supports splitting by
+whitespace or by a fixed string.  As you'd expect, there's a module-level
+:func:`re.split` function, too.
+
+
+.. method:: .split(string [, maxsplit\ ``= 0``])
+   :noindex:
+
+   Split *string* by the matches of the regular expression.  If capturing
+   parentheses are used in the RE, then their contents will also be returned as
+   part of the resulting list.  If *maxsplit* is nonzero, at most *maxsplit* splits
+   are performed.
+
+You can limit the number of splits made, by passing a value for *maxsplit*.
+When *maxsplit* is nonzero, at most *maxsplit* splits will be made, and the
+remainder of the string is returned as the final element of the list.  In the
+following example, the delimiter is any sequence of non-alphanumeric characters.
+::
+
+   >>> p = re.compile(r'\W+')
+   >>> p.split('This is a test, short and sweet, of split().')
+   ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
+   >>> p.split('This is a test, short and sweet, of split().', 3)
+   ['This', 'is', 'a', 'test, short and sweet, of split().']
+
+Sometimes you're not only interested in what the text between delimiters is, but
+also need to know what the delimiter was.  If capturing parentheses are used in
+the RE, then their values are also returned as part of the list.  Compare the
+following calls::
+
+   >>> p = re.compile(r'\W+')
+   >>> p2 = re.compile(r'(\W+)')
+   >>> p.split('This... is a test.')
+   ['This', 'is', 'a', 'test', '']
+   >>> p2.split('This... is a test.')
+   ['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', '']
+
+The module-level function :func:`re.split` adds the RE to be used as the first
+argument, but is otherwise the same.   ::
+
+   >>> re.split('[\W]+', 'Words, words, words.')
+   ['Words', 'words', 'words', '']
+   >>> re.split('([\W]+)', 'Words, words, words.')
+   ['Words', ', ', 'words', ', ', 'words', '.', '']
+   >>> re.split('[\W]+', 'Words, words, words.', 1)
+   ['Words', 'words, words.']
+
+
+Search and Replace
+------------------
+
+Another common task is to find all the matches for a pattern, and replace them
+with a different string.  The :meth:`sub` method takes a replacement value,
+which can be either a string or a function, and the string to be processed.
+
+
+.. method:: .sub(replacement, string[, count\ ``= 0``])
+   :noindex:
+
+   Returns the string obtained by replacing the leftmost non-overlapping
+   occurrences of the RE in *string* by the replacement *replacement*.  If the
+   pattern isn't found, *string* is returned unchanged.
+
+   The optional argument *count* is the maximum number of pattern occurrences to be
+   replaced; *count* must be a non-negative integer.  The default value of 0 means
+   to replace all occurrences.
+
+Here's a simple example of using the :meth:`sub` method.  It replaces colour
+names with the word ``colour``::
+
+   >>> p = re.compile( '(blue|white|red)')
+   >>> p.sub( 'colour', 'blue socks and red shoes')
+   'colour socks and colour shoes'
+   >>> p.sub( 'colour', 'blue socks and red shoes', count=1)
+   'colour socks and red shoes'
+
+The :meth:`subn` method does the same work, but returns a 2-tuple containing the
+new string value and the number of replacements  that were performed::
+
+   >>> p = re.compile( '(blue|white|red)')
+   >>> p.subn( 'colour', 'blue socks and red shoes')
+   ('colour socks and colour shoes', 2)
+   >>> p.subn( 'colour', 'no colours at all')
+   ('no colours at all', 0)
+
+Empty matches are replaced only when they're not adjacent to a previous match.
+::
+
+   >>> p = re.compile('x*')
+   >>> p.sub('-', 'abxd')
+   '-a-b-d-'
+
+If *replacement* is a string, any backslash escapes in it are processed.  That
+is, ``\n`` is converted to a single newline character, ``\r`` is converted to a
+carriage return, and so forth. Unknown escapes such as ``\j`` are left alone.
+Backreferences, such as ``\6``, are replaced with the substring matched by the
+corresponding group in the RE.  This lets you incorporate portions of the
+original text in the resulting replacement string.
+
+This example matches the word ``section`` followed by a string enclosed in
+``{``, ``}``, and changes ``section`` to ``subsection``::
+
+   >>> p = re.compile('section{ ( [^}]* ) }', re.VERBOSE)
+   >>> p.sub(r'subsection{\1}','section{First} section{second}')
+   'subsection{First} subsection{second}'
+
+There's also a syntax for referring to named groups as defined by the
+``(?P<name>...)`` syntax.  ``\g<name>`` will use the substring matched by the
+group named ``name``, and  ``\g<number>``  uses the corresponding group number.
+``\g<2>`` is therefore equivalent to ``\2``,  but isn't ambiguous in a
+replacement string such as ``\g<2>0``.  (``\20`` would be interpreted as a
+reference to group 20, not a reference to group 2 followed by the literal
+character ``'0'``.)  The following substitutions are all equivalent, but use all
+three variations of the replacement string. ::
+
+   >>> p = re.compile('section{ (?P<name> [^}]* ) }', re.VERBOSE)
+   >>> p.sub(r'subsection{\1}','section{First}')
+   'subsection{First}'
+   >>> p.sub(r'subsection{\g<1>}','section{First}')
+   'subsection{First}'
+   >>> p.sub(r'subsection{\g<name>}','section{First}')
+   'subsection{First}'
+
+*replacement* can also be a function, which gives you even more control.  If
+*replacement* is a function, the function is called for every non-overlapping
+occurrence of *pattern*.  On each call, the function is  passed a
+:class:`MatchObject` argument for the match and can use this information to
+compute the desired replacement string and return it.
+
+In the following example, the replacement function translates  decimals into
+hexadecimal::
+
+   >>> def hexrepl( match ):
+   ...     "Return the hex string for a decimal number"
+   ...     value = int( match.group() )
+   ...     return hex(value)
+   ...
+   >>> p = re.compile(r'\d+')
+   >>> p.sub(hexrepl, 'Call 65490 for printing, 49152 for user code.')
+   'Call 0xffd2 for printing, 0xc000 for user code.'
+
+When using the module-level :func:`re.sub` function, the pattern is passed as
+the first argument.  The pattern may be a string or a :class:`RegexObject`; if
+you need to specify regular expression flags, you must either use a
+:class:`RegexObject` as the first parameter, or use embedded modifiers in the
+pattern, e.g.  ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``.
+
+
+Common Problems
+===============
+
+Regular expressions are a powerful tool for some applications, but in some ways
+their behaviour isn't intuitive and at times they don't behave the way you may
+expect them to.  This section will point out some of the most common pitfalls.
+
+
+Use String Methods
+------------------
+
+Sometimes using the :mod:`re` module is a mistake.  If you're matching a fixed
+string, or a single character class, and you're not using any :mod:`re` features
+such as the :const:`IGNORECASE` flag, then the full power of regular expressions
+may not be required. Strings have several methods for performing operations with
+fixed strings and they're usually much faster, because the implementation is a
+single small C loop that's been optimized for the purpose, instead of the large,
+more generalized regular expression engine.
+
+One example might be replacing a single fixed string with another one; for
+example, you might replace ``word`` with ``deed``.  ``re.sub()`` seems like the
+function to use for this, but consider the :meth:`replace` method.  Note that
+:func:`replace` will also replace ``word`` inside words, turning ``swordfish``
+into ``sdeedfish``, but the  naive RE ``word`` would have done that, too.  (To
+avoid performing the substitution on parts of words, the pattern would have to
+be ``\bword\b``, in order to require that ``word`` have a word boundary on
+either side.  This takes the job beyond  :meth:`replace`'s abilities.)
+
+Another common task is deleting every occurrence of a single character from a
+string or replacing it with another single character.  You might do this with
+something like ``re.sub('\n', ' ', S)``, but :meth:`translate` is capable of
+doing both tasks and will be faster than any regular expression operation can
+be.
+
+In short, before turning to the :mod:`re` module, consider whether your problem
+can be solved with a faster and simpler string method.
+
+
+match() versus search()
+-----------------------
+
+The :func:`match` function only checks if the RE matches at the beginning of the
+string while :func:`search` will scan forward through the string for a match.
+It's important to keep this distinction in mind.  Remember,  :func:`match` will
+only report a successful match which will start at 0; if the match wouldn't
+start at zero,  :func:`match` will *not* report it. ::
+
+   >>> print re.match('super', 'superstition').span()  
+   (0, 5)
+   >>> print re.match('super', 'insuperable')    
+   None
+
+On the other hand, :func:`search` will scan forward through the string,
+reporting the first match it finds. ::
+
+   >>> print re.search('super', 'superstition').span()
+   (0, 5)
+   >>> print re.search('super', 'insuperable').span()
+   (2, 7)
+
+Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*``
+to the front of your RE.  Resist this temptation and use :func:`re.search`
+instead.  The regular expression compiler does some analysis of REs in order to
+speed up the process of looking for a match.  One such analysis figures out what
+the first character of a match must be; for example, a pattern starting with
+``Crow`` must match starting with a ``'C'``.  The analysis lets the engine
+quickly scan through the string looking for the starting character, only trying
+the full match if a ``'C'`` is found.
+
+Adding ``.*`` defeats this optimization, requiring scanning to the end of the
+string and then backtracking to find a match for the rest of the RE.  Use
+:func:`re.search` instead.
+
+
+Greedy versus Non-Greedy
+------------------------
+
+When repeating a regular expression, as in ``a*``, the resulting action is to
+consume as much of the pattern as possible.  This fact often bites you when
+you're trying to match a pair of balanced delimiters, such as the angle brackets
+surrounding an HTML tag.  The naive pattern for matching a single HTML tag
+doesn't work because of the greedy nature of ``.*``. ::
+
+   >>> s = '<html><head><title>Title</title>'
+   >>> len(s)
+   32
+   >>> print re.match('<.*>', s).span()
+   (0, 32)
+   >>> print re.match('<.*>', s).group()
+   <html><head><title>Title</title>
+
+The RE matches the ``'<'`` in ``<html>``, and the ``.*`` consumes the rest of
+the string.  There's still more left in the RE, though, and the ``>`` can't
+match at the end of the string, so the regular expression engine has to
+backtrack character by character until it finds a match for the ``>``.   The
+final match extends from the ``'<'`` in ``<html>`` to the ``'>'`` in
+``</title>``, which isn't what you want.
+
+In this case, the solution is to use the non-greedy qualifiers ``*?``, ``+?``,
+``??``, or ``{m,n}?``, which match as *little* text as possible.  In the above
+example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
+when it fails, the engine advances a character at a time, retrying the ``'>'``
+at every step.  This produces just the right result::
+
+   >>> print re.match('<.*?>', s).group()
+   <html>
+
+(Note that parsing HTML or XML with regular expressions is painful. Quick-and-
+dirty patterns will handle common cases, but HTML and XML have special cases
+that will break the obvious regular expression; by the time you've written a
+regular expression that handles all of the possible cases, the patterns will be
+*very* complicated.  Use an HTML or XML parser module for such tasks.)
+
+
+Not Using re.VERBOSE
+--------------------
+
+By now you've probably noticed that regular expressions are a very compact
+notation, but they're not terribly readable.  REs of moderate complexity can
+become lengthy collections of backslashes, parentheses, and metacharacters,
+making them difficult to read and understand.
+
+For such REs, specifying the ``re.VERBOSE`` flag when compiling the regular
+expression can be helpful, because it allows you to format the regular
+expression more clearly.
+
+The ``re.VERBOSE`` flag has several effects.  Whitespace in the regular
+expression that *isn't* inside a character class is ignored.  This means that an
+expression such as ``dog | cat`` is equivalent to the less readable ``dog|cat``,
+but ``[a b]`` will still match the characters ``'a'``, ``'b'``, or a space.  In
+addition, you can also put comments inside a RE; comments extend from a ``#``
+character to the next newline.  When used with triple-quoted strings, this
+enables REs to be formatted more neatly::
+
+   pat = re.compile(r"""
+    \s*                 # Skip leading whitespace
+    (?P<header>[^:]+)   # Header name
+    \s* :               # Whitespace, and a colon
+    (?P<value>.*?)      # The header's value -- *? used to
+                        # lose the following trailing whitespace
+    \s*$                # Trailing whitespace to end-of-line
+   """, re.VERBOSE)
+
+This is far more readable than:
+
+.. % $
+
+::
+
+   pat = re.compile(r"\s*(?P<header>[^:]+)\s*:(?P<value>.*?)\s*$")
+
+.. % $
+
+
+Feedback
+========
+
+Regular expressions are a complicated topic.  Did this document help you
+understand them?  Were there parts that were unclear, or Problems you
+encountered that weren't covered here?  If so, please send suggestions for
+improvements to the author.
+
+The most complete book on regular expressions is almost certainly Jeffrey
+Friedl's Mastering Regular Expressions, published by O'Reilly.  Unfortunately,
+it exclusively concentrates on Perl and Java's flavours of regular expressions,
+and doesn't contain any Python material at all, so it won't be useful as a
+reference for programming in Python.  (The first edition covered Python's now-
+removed :mod:`regex` module, which won't help you much.)  Consider checking it
+out from your library.
+
+
+.. rubric:: Footnotes
+
+.. [#] Introduced in Python 2.2.2.
+

Added: doctools/trunk/Doc-3k/howto/sockets.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/sockets.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,421 @@
+****************************
+  Socket Programming HOWTO  
+****************************
+
+:Author: Gordon McMillan
+
+
+.. topic:: Abstract
+
+   Sockets are used nearly everywhere, but are one of the most severely
+   misunderstood technologies around. This is a 10,000 foot overview of sockets.
+   It's not really a tutorial - you'll still have work to do in getting things
+   operational. It doesn't cover the fine points (and there are a lot of them), but
+   I hope it will give you enough background to begin using them decently.
+
+
+Sockets
+=======
+
+Sockets are used nearly everywhere, but are one of the most severely
+misunderstood technologies around. This is a 10,000 foot overview of sockets.
+It's not really a tutorial - you'll still have work to do in getting things
+working. It doesn't cover the fine points (and there are a lot of them), but I
+hope it will give you enough background to begin using them decently.
+
+I'm only going to talk about INET sockets, but they account for at least 99% of
+the sockets in use. And I'll only talk about STREAM sockets - unless you really
+know what you're doing (in which case this HOWTO isn't for you!), you'll get
+better behavior and performance from a STREAM socket than anything else. I will
+try to clear up the mystery of what a socket is, as well as some hints on how to
+work with blocking and non-blocking sockets. But I'll start by talking about
+blocking sockets. You'll need to know how they work before dealing with non-
+blocking sockets.
+
+Part of the trouble with understanding these things is that "socket" can mean a
+number of subtly different things, depending on context. So first, let's make a
+distinction between a "client" socket - an endpoint of a conversation, and a
+"server" socket, which is more like a switchboard operator. The client
+application (your browser, for example) uses "client" sockets exclusively; the
+web server it's talking to uses both "server" sockets and "client" sockets.
+
+
+History
+-------
+
+Of the various forms of IPC (*Inter Process Communication*), sockets are by far
+the most popular.  On any given platform, there are likely to be other forms of
+IPC that are faster, but for cross-platform communication, sockets are about the
+only game in town.
+
+They were invented in Berkeley as part of the BSD flavor of Unix. They spread
+like wildfire with the Internet. With good reason --- the combination of sockets
+with INET makes talking to arbitrary machines around the world unbelievably easy
+(at least compared to other schemes).
+
+
+Creating a Socket
+=================
+
+Roughly speaking, when you clicked on the link that brought you to this page,
+your browser did something like the following::
+
+   #create an INET, STREAMing socket
+   s = socket.socket(
+       socket.AF_INET, socket.SOCK_STREAM)
+   #now connect to the web server on port 80 
+   # - the normal http port
+   s.connect(("www.mcmillan-inc.com", 80))
+
+When the ``connect`` completes, the socket ``s`` can now be used to send in a
+request for the text of this page. The same socket will read the reply, and then
+be destroyed. That's right - destroyed. Client sockets are normally only used
+for one exchange (or a small set of sequential exchanges).
+
+What happens in the web server is a bit more complex. First, the web server
+creates a "server socket". ::
+
+   #create an INET, STREAMing socket
+   serversocket = socket.socket(
+       socket.AF_INET, socket.SOCK_STREAM)
+   #bind the socket to a public host, 
+   # and a well-known port
+   serversocket.bind((socket.gethostname(), 80))
+   #become a server socket
+   serversocket.listen(5)
+
+A couple things to notice: we used ``socket.gethostname()`` so that the socket
+would be visible to the outside world. If we had used ``s.bind(('', 80))`` or
+``s.bind(('localhost', 80))`` or ``s.bind(('127.0.0.1', 80))`` we would still
+have a "server" socket, but one that was only visible within the same machine.
+
+A second thing to note: low number ports are usually reserved for "well known"
+services (HTTP, SNMP etc). If you're playing around, use a nice high number (4
+digits).
+
+Finally, the argument to ``listen`` tells the socket library that we want it to
+queue up as many as 5 connect requests (the normal max) before refusing outside
+connections. If the rest of the code is written properly, that should be plenty.
+
+OK, now we have a "server" socket, listening on port 80. Now we enter the
+mainloop of the web server::
+
+   while 1:
+       #accept connections from outside
+       (clientsocket, address) = serversocket.accept()
+       #now do something with the clientsocket
+       #in this case, we'll pretend this is a threaded server
+       ct = client_thread(clientsocket)
+       ct.run()
+
+There's actually 3 general ways in which this loop could work - dispatching a
+thread to handle ``clientsocket``, create a new process to handle
+``clientsocket``, or restructure this app to use non-blocking sockets, and
+mulitplex between our "server" socket and any active ``clientsocket``\ s using
+``select``. More about that later. The important thing to understand now is
+this: this is *all* a "server" socket does. It doesn't send any data. It doesn't
+receive any data. It just produces "client" sockets. Each ``clientsocket`` is
+created in response to some *other* "client" socket doing a ``connect()`` to the
+host and port we're bound to. As soon as we've created that ``clientsocket``, we
+go back to listening for more connections. The two "clients" are free to chat it
+up - they are using some dynamically allocated port which will be recycled when
+the conversation ends.
+
+
+IPC
+---
+
+If you need fast IPC between two processes on one machine, you should look into
+whatever form of shared memory the platform offers. A simple protocol based
+around shared memory and locks or semaphores is by far the fastest technique.
+
+If you do decide to use sockets, bind the "server" socket to ``'localhost'``. On
+most platforms, this will take a shortcut around a couple of layers of network
+code and be quite a bit faster.
+
+
+Using a Socket
+==============
+
+The first thing to note, is that the web browser's "client" socket and the web
+server's "client" socket are identical beasts. That is, this is a "peer to peer"
+conversation. Or to put it another way, *as the designer, you will have to
+decide what the rules of etiquette are for a conversation*. Normally, the
+``connect``\ ing socket starts the conversation, by sending in a request, or
+perhaps a signon. But that's a design decision - it's not a rule of sockets.
+
+Now there are two sets of verbs to use for communication. You can use ``send``
+and ``recv``, or you can transform your client socket into a file-like beast and
+use ``read`` and ``write``. The latter is the way Java presents their sockets.
+I'm not going to talk about it here, except to warn you that you need to use
+``flush`` on sockets. These are buffered "files", and a common mistake is to
+``write`` something, and then ``read`` for a reply. Without a ``flush`` in
+there, you may wait forever for the reply, because the request may still be in
+your output buffer.
+
+Now we come the major stumbling block of sockets - ``send`` and ``recv`` operate
+on the network buffers. They do not necessarily handle all the bytes you hand
+them (or expect from them), because their major focus is handling the network
+buffers. In general, they return when the associated network buffers have been
+filled (``send``) or emptied (``recv``). They then tell you how many bytes they
+handled. It is *your* responsibility to call them again until your message has
+been completely dealt with.
+
+When a ``recv`` returns 0 bytes, it means the other side has closed (or is in
+the process of closing) the connection.  You will not receive any more data on
+this connection. Ever.  You may be able to send data successfully; I'll talk
+about that some on the next page.
+
+A protocol like HTTP uses a socket for only one transfer. The client sends a
+request, the reads a reply.  That's it. The socket is discarded. This means that
+a client can detect the end of the reply by receiving 0 bytes.
+
+But if you plan to reuse your socket for further transfers, you need to realize
+that *there is no "EOT" (End of Transfer) on a socket.* I repeat: if a socket
+``send`` or ``recv`` returns after handling 0 bytes, the connection has been
+broken.  If the connection has *not* been broken, you may wait on a ``recv``
+forever, because the socket will *not* tell you that there's nothing more to
+read (for now).  Now if you think about that a bit, you'll come to realize a
+fundamental truth of sockets: *messages must either be fixed length* (yuck), *or
+be delimited* (shrug), *or indicate how long they are* (much better), *or end by
+shutting down the connection*. The choice is entirely yours, (but some ways are
+righter than others).
+
+Assuming you don't want to end the connection, the simplest solution is a fixed
+length message::
+
+   class mysocket:
+       '''demonstration class only 
+         - coded for clarity, not efficiency
+       '''
+
+       def __init__(self, sock=None):
+   	if sock is None:
+   	    self.sock = socket.socket(
+   		socket.AF_INET, socket.SOCK_STREAM)
+   	else:
+   	    self.sock = sock
+
+       def connect(self, host, port):
+   	self.sock.connect((host, port))
+
+       def mysend(self, msg):
+   	totalsent = 0
+   	while totalsent < MSGLEN:
+   	    sent = self.sock.send(msg[totalsent:])
+   	    if sent == 0:
+   		raise RuntimeError, \\
+   		    "socket connection broken"
+   	    totalsent = totalsent + sent
+
+       def myreceive(self):
+   	msg = ''
+   	while len(msg) < MSGLEN:
+   	    chunk = self.sock.recv(MSGLEN-len(msg))
+   	    if chunk == '':
+   		raise RuntimeError, \\
+   		    "socket connection broken"
+   	    msg = msg + chunk
+   	return msg
+
+The sending code here is usable for almost any messaging scheme - in Python you
+send strings, and you can use ``len()`` to determine its length (even if it has
+embedded ``\0`` characters). It's mostly the receiving code that gets more
+complex. (And in C, it's not much worse, except you can't use ``strlen`` if the
+message has embedded ``\0``\ s.)
+
+The easiest enhancement is to make the first character of the message an
+indicator of message type, and have the type determine the length. Now you have
+two ``recv``\ s - the first to get (at least) that first character so you can
+look up the length, and the second in a loop to get the rest. If you decide to
+go the delimited route, you'll be receiving in some arbitrary chunk size, (4096
+or 8192 is frequently a good match for network buffer sizes), and scanning what
+you've received for a delimiter.
+
+One complication to be aware of: if your conversational protocol allows multiple
+messages to be sent back to back (without some kind of reply), and you pass
+``recv`` an arbitrary chunk size, you may end up reading the start of a
+following message. You'll need to put that aside and hold onto it, until it's
+needed.
+
+Prefixing the message with it's length (say, as 5 numeric characters) gets more
+complex, because (believe it or not), you may not get all 5 characters in one
+``recv``. In playing around, you'll get away with it; but in high network loads,
+your code will very quickly break unless you use two ``recv`` loops - the first
+to determine the length, the second to get the data part of the message. Nasty.
+This is also when you'll discover that ``send`` does not always manage to get
+rid of everything in one pass. And despite having read this, you will eventually
+get bit by it!
+
+In the interests of space, building your character, (and preserving my
+competitive position), these enhancements are left as an exercise for the
+reader. Lets move on to cleaning up.
+
+
+Binary Data
+-----------
+
+It is perfectly possible to send binary data over a socket. The major problem is
+that not all machines use the same formats for binary data. For example, a
+Motorola chip will represent a 16 bit integer with the value 1 as the two hex
+bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
+Socket libraries have calls for converting 16 and 32 bit integers - ``ntohl,
+htonl, ntohs, htons`` where "n" means *network* and "h" means *host*, "s" means
+*short* and "l" means *long*. Where network order is host order, these do
+nothing, but where the machine is byte-reversed, these swap the bytes around
+appropriately.
+
+In these days of 32 bit machines, the ascii representation of binary data is
+frequently smaller than the binary representation. That's because a surprising
+amount of the time, all those longs have the value 0, or maybe 1. The string "0"
+would be two bytes, while binary is four. Of course, this doesn't fit well with
+fixed-length messages. Decisions, decisions.
+
+
+Disconnecting
+=============
+
+Strictly speaking, you're supposed to use ``shutdown`` on a socket before you
+``close`` it.  The ``shutdown`` is an advisory to the socket at the other end.
+Depending on the argument you pass it, it can mean "I'm not going to send
+anymore, but I'll still listen", or "I'm not listening, good riddance!".  Most
+socket libraries, however, are so used to programmers neglecting to use this
+piece of etiquette that normally a ``close`` is the same as ``shutdown();
+close()``.  So in most situations, an explicit ``shutdown`` is not needed.
+
+One way to use ``shutdown`` effectively is in an HTTP-like exchange. The client
+sends a request and then does a ``shutdown(1)``. This tells the server "This
+client is done sending, but can still receive."  The server can detect "EOF" by
+a receive of 0 bytes. It can assume it has the complete request.  The server
+sends a reply. If the ``send`` completes successfully then, indeed, the client
+was still receiving.
+
+Python takes the automatic shutdown a step further, and says that when a socket
+is garbage collected, it will automatically do a ``close`` if it's needed. But
+relying on this is a very bad habit. If your socket just disappears without
+doing a ``close``, the socket at the other end may hang indefinitely, thinking
+you're just being slow. *Please* ``close`` your sockets when you're done.
+
+
+When Sockets Die
+----------------
+
+Probably the worst thing about using blocking sockets is what happens when the
+other side comes down hard (without doing a ``close``). Your socket is likely to
+hang. SOCKSTREAM is a reliable protocol, and it will wait a long, long time
+before giving up on a connection. If you're using threads, the entire thread is
+essentially dead. There's not much you can do about it. As long as you aren't
+doing something dumb, like holding a lock while doing a blocking read, the
+thread isn't really consuming much in the way of resources. Do *not* try to kill
+the thread - part of the reason that threads are more efficient than processes
+is that they avoid the overhead associated with the automatic recycling of
+resources. In other words, if you do manage to kill the thread, your whole
+process is likely to be screwed up.
+
+
+Non-blocking Sockets
+====================
+
+If you've understood the preceeding, you already know most of what you need to
+know about the mechanics of using sockets. You'll still use the same calls, in
+much the same ways. It's just that, if you do it right, your app will be almost
+inside-out.
+
+In Python, you use ``socket.setblocking(0)`` to make it non-blocking. In C, it's
+more complex, (for one thing, you'll need to choose between the BSD flavor
+``O_NONBLOCK`` and the almost indistinguishable Posix flavor ``O_NDELAY``, which
+is completely different from ``TCP_NODELAY``), but it's the exact same idea. You
+do this after creating the socket, but before using it. (Actually, if you're
+nuts, you can switch back and forth.)
+
+The major mechanical difference is that ``send``, ``recv``, ``connect`` and
+``accept`` can return without having done anything. You have (of course) a
+number of choices. You can check return code and error codes and generally drive
+yourself crazy. If you don't believe me, try it sometime. Your app will grow
+large, buggy and suck CPU. So let's skip the brain-dead solutions and do it
+right.
+
+Use ``select``.
+
+In C, coding ``select`` is fairly complex. In Python, it's a piece of cake, but
+it's close enough to the C version that if you understand ``select`` in Python,
+you'll have little trouble with it in C. ::
+
+   ready_to_read, ready_to_write, in_error = \\
+                  select.select(
+                     potential_readers, 
+                     potential_writers, 
+                     potential_errs, 
+                     timeout)
+
+You pass ``select`` three lists: the first contains all sockets that you might
+want to try reading; the second all the sockets you might want to try writing
+to, and the last (normally left empty) those that you want to check for errors.
+You should note that a socket can go into more than one list. The ``select``
+call is blocking, but you can give it a timeout. This is generally a sensible
+thing to do - give it a nice long timeout (say a minute) unless you have good
+reason to do otherwise.
+
+In return, you will get three lists. They have the sockets that are actually
+readable, writable and in error. Each of these lists is a subset (possbily
+empty) of the corresponding list you passed in. And if you put a socket in more
+than one input list, it will only be (at most) in one output list.
+
+If a socket is in the output readable list, you can be as-close-to-certain-as-
+we-ever-get-in-this-business that a ``recv`` on that socket will return
+*something*. Same idea for the writable list. You'll be able to send
+*something*. Maybe not all you want to, but *something* is better than nothing.
+(Actually, any reasonably healthy socket will return as writable - it just means
+outbound network buffer space is available.)
+
+If you have a "server" socket, put it in the potential_readers list. If it comes
+out in the readable list, your ``accept`` will (almost certainly) work. If you
+have created a new socket to ``connect`` to someone else, put it in the
+ptoential_writers list. If it shows up in the writable list, you have a decent
+chance that it has connected.
+
+One very nasty problem with ``select``: if somewhere in those input lists of
+sockets is one which has died a nasty death, the ``select`` will fail. You then
+need to loop through every single damn socket in all those lists and do a
+``select([sock],[],[],0)`` until you find the bad one. That timeout of 0 means
+it won't take long, but it's ugly.
+
+Actually, ``select`` can be handy even with blocking sockets. It's one way of
+determining whether you will block - the socket returns as readable when there's
+something in the buffers.  However, this still doesn't help with the problem of
+determining whether the other end is done, or just busy with something else.
+
+**Portability alert**: On Unix, ``select`` works both with the sockets and
+files. Don't try this on Windows. On Windows, ``select`` works with sockets
+only. Also note that in C, many of the more advanced socket options are done
+differently on Windows. In fact, on Windows I usually use threads (which work
+very, very well) with my sockets. Face it, if you want any kind of performance,
+your code will look very different on Windows than on Unix. (I haven't the
+foggiest how you do this stuff on a Mac.)
+
+
+Performance
+-----------
+
+There's no question that the fastest sockets code uses non-blocking sockets and
+select to multiplex them. You can put together something that will saturate a
+LAN connection without putting any strain on the CPU. The trouble is that an app
+written this way can't do much of anything else - it needs to be ready to
+shuffle bytes around at all times.
+
+Assuming that your app is actually supposed to do something more than that,
+threading is the optimal solution, (and using non-blocking sockets will be
+faster than using blocking sockets). Unfortunately, threading support in Unixes
+varies both in API and quality. So the normal Unix solution is to fork a
+subprocess to deal with each connection. The overhead for this is significant
+(and don't do this on Windows - the overhead of process creation is enormous
+there). It also means that unless each subprocess is completely independent,
+you'll need to use another form of IPC, say a pipe, or shared memory and
+semaphores, to communicate between the parent and child processes.
+
+Finally, remember that even though blocking sockets are somewhat slower than
+non-blocking, in many cases they are the "right" solution. After all, if your
+app is driven by the data it receives over a socket, there's not much sense in
+complicating the logic just so your app can wait on ``select`` instead of
+``recv``.
+

Added: doctools/trunk/Doc-3k/howto/unicode.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/unicode.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,732 @@
+*****************
+  Unicode HOWTO
+*****************
+
+:Release: 1.02
+
+This HOWTO discusses Python's support for Unicode, and explains various problems
+that people commonly encounter when trying to work with Unicode.
+
+Introduction to Unicode
+=======================
+
+History of Character Codes
+--------------------------
+
+In 1968, the American Standard Code for Information Interchange, better known by
+its acronym ASCII, was standardized.  ASCII defined numeric codes for various
+characters, with the numeric values running from 0 to
+127.  For example, the lowercase letter 'a' is assigned 97 as its code
+value.
+
+ASCII was an American-developed standard, so it only defined unaccented
+characters.  There was an 'e', but no 'é' or 'Í'.  This meant that languages
+which required accented characters couldn't be faithfully represented in ASCII.
+(Actually the missing accents matter for English, too, which contains words such
+as 'naïve' and 'café', and some publications have house styles which require
+spellings such as 'coöperate'.)
+
+For a while people just wrote programs that didn't display accents.  I remember
+looking at Apple ][ BASIC programs, published in French-language publications in
+the mid-1980s, that had lines like these::
+
+	PRINT "FICHER EST COMPLETE."
+	PRINT "CARACTERE NON ACCEPTE."
+
+Those messages should contain accents, and they just look wrong to someone who
+can read French.
+
+In the 1980s, almost all personal computers were 8-bit, meaning that bytes could
+hold values ranging from 0 to 255.  ASCII codes only went up to 127, so some
+machines assigned values between 128 and 255 to accented characters.  Different
+machines had different codes, however, which led to problems exchanging files.
+Eventually various commonly used sets of values for the 128-255 range emerged.
+Some were true standards, defined by the International Standards Organization,
+and some were **de facto** conventions that were invented by one company or
+another and managed to catch on.
+
+255 characters aren't very many.  For example, you can't fit both the accented
+characters used in Western Europe and the Cyrillic alphabet used for Russian
+into the 128-255 range because there are more than 127 such characters.
+
+You could write files using different codes (all your Russian files in a coding
+system called KOI8, all your French files in a different coding system called
+Latin1), but what if you wanted to write a French document that quotes some
+Russian text?  In the 1980s people began to want to solve this problem, and the
+Unicode standardization effort began.
+
+Unicode started out using 16-bit characters instead of 8-bit characters.  16
+bits means you have 2^16 = 65,536 distinct values available, making it possible
+to represent many different characters from many different alphabets; an initial
+goal was to have Unicode contain the alphabets for every single human language.
+It turns out that even 16 bits isn't enough to meet that goal, and the modern
+Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff in
+base-16).
+
+There's a related ISO standard, ISO 10646.  Unicode and ISO 10646 were
+originally separate efforts, but the specifications were merged with the 1.1
+revision of Unicode.
+
+(This discussion of Unicode's history is highly simplified.  I don't think the
+average Python programmer needs to worry about the historical details; consult
+the Unicode consortium site listed in the References for more information.)
+
+
+Definitions
+-----------
+
+A **character** is the smallest possible component of a text.  'A', 'B', 'C',
+etc., are all different characters.  So are 'È' and 'Í'.  Characters are
+abstractions, and vary depending on the language or context you're talking
+about.  For example, the symbol for ohms (Ω) is usually drawn much like the
+capital letter omega (Ω) in the Greek alphabet (they may even be the same in
+some fonts), but these are two different characters that have different
+meanings.
+
+The Unicode standard describes how characters are represented by **code
+points**.  A code point is an integer value, usually denoted in base 16.  In the
+standard, a code point is written using the notation U+12ca to mean the
+character with value 0x12ca (4810 decimal).  The Unicode standard contains a lot
+of tables listing characters and their corresponding code points::
+
+	0061    'a'; LATIN SMALL LETTER A
+	0062    'b'; LATIN SMALL LETTER B
+	0063    'c'; LATIN SMALL LETTER C
+        ...
+	007B	'{'; LEFT CURLY BRACKET
+
+Strictly, these definitions imply that it's meaningless to say 'this is
+character U+12ca'.  U+12ca is a code point, which represents some particular
+character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'.  In
+informal contexts, this distinction between code points and characters will
+sometimes be forgotten.
+
+A character is represented on a screen or on paper by a set of graphical
+elements that's called a **glyph**.  The glyph for an uppercase A, for example,
+is two diagonal strokes and a horizontal stroke, though the exact details will
+depend on the font being used.  Most Python code doesn't need to worry about
+glyphs; figuring out the correct glyph to display is generally the job of a GUI
+toolkit or a terminal's font renderer.
+
+
+Encodings
+---------
+
+To summarize the previous section: a Unicode string is a sequence of code
+points, which are numbers from 0 to 0x10ffff.  This sequence needs to be
+represented as a set of bytes (meaning, values from 0-255) in memory.  The rules
+for translating a Unicode string into a sequence of bytes are called an
+**encoding**.
+
+The first encoding you might think of is an array of 32-bit integers.  In this
+representation, the string "Python" would look like this::
+
+       P           y           t           h           o           n
+    0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 
+       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
+
+This representation is straightforward but using it presents a number of
+problems.
+
+1. It's not portable; different processors order the bytes differently.
+
+2. It's very wasteful of space.  In most texts, the majority of the code points
+   are less than 127, or less than 255, so a lot of space is occupied by zero
+   bytes.  The above string takes 24 bytes compared to the 6 bytes needed for an
+   ASCII representation.  Increased RAM usage doesn't matter too much (desktop
+   computers have megabytes of RAM, and strings aren't usually that large), but
+   expanding our usage of disk and network bandwidth by a factor of 4 is
+   intolerable.
+
+3. It's not compatible with existing C functions such as ``strlen()``, so a new
+   family of wide string functions would need to be used.
+
+4. Many Internet standards are defined in terms of textual data, and can't
+   handle content with embedded zero bytes.
+
+Generally people don't use this encoding, instead choosing other encodings that
+are more efficient and convenient.
+
+Encodings don't have to handle every possible Unicode character, and most
+encodings don't.  For example, Python's default encoding is the 'ascii'
+encoding.  The rules for converting a Unicode string into the ASCII encoding are
+simple; for each code point:
+
+1. If the code point is < 128, each byte is the same as the value of the code
+   point.
+
+2. If the code point is 128 or greater, the Unicode string can't be represented
+   in this encoding.  (Python raises a :exc:`UnicodeEncodeError` exception in this
+   case.)
+
+Latin-1, also known as ISO-8859-1, is a similar encoding.  Unicode code points
+0-255 are identical to the Latin-1 values, so converting to this encoding simply
+requires converting code points to byte values; if a code point larger than 255
+is encountered, the string can't be encoded into Latin-1.
+
+Encodings don't have to be simple one-to-one mappings like Latin-1.  Consider
+IBM's EBCDIC, which was used on IBM mainframes.  Letter values weren't in one
+block: 'a' through 'i' had values from 129 to 137, but 'j' through 'r' were 145
+through 153.  If you wanted to use EBCDIC as an encoding, you'd probably use
+some sort of lookup table to perform the conversion, but this is largely an
+internal detail.
+
+UTF-8 is one of the most commonly used encodings.  UTF stands for "Unicode
+Transformation Format", and the '8' means that 8-bit numbers are used in the
+encoding.  (There's also a UTF-16 encoding, but it's less frequently used than
+UTF-8.)  UTF-8 uses the following rules:
+
+1. If the code point is <128, it's represented by the corresponding byte value.
+2. If the code point is between 128 and 0x7ff, it's turned into two byte values
+   between 128 and 255.
+3. Code points >0x7ff are turned into three- or four-byte sequences, where each
+   byte of the sequence is between 128 and 255.
+    
+UTF-8 has several convenient properties:
+
+1. It can handle any Unicode code point.
+2. A Unicode string is turned into a string of bytes containing no embedded zero
+   bytes.  This avoids byte-ordering issues, and means UTF-8 strings can be
+   processed by C functions such as ``strcpy()`` and sent through protocols that
+   can't handle zero bytes.
+3. A string of ASCII text is also valid UTF-8 text.
+4. UTF-8 is fairly compact; the majority of code points are turned into two
+   bytes, and values less than 128 occupy only a single byte.
+5. If bytes are corrupted or lost, it's possible to determine the start of the
+   next UTF-8-encoded code point and resynchronize.  It's also unlikely that
+   random 8-bit data will look like valid UTF-8.
+
+
+
+References
+----------
+
+The Unicode Consortium site at <http://www.unicode.org> has character charts, a
+glossary, and PDF versions of the Unicode specification.  Be prepared for some
+difficult reading.  <http://www.unicode.org/history/> is a chronology of the
+origin and development of Unicode.
+
+To help understand the standard, Jukka Korpela has written an introductory guide
+to reading the Unicode character tables, available at
+<http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
+
+Roman Czyborra wrote another explanation of Unicode's basic principles; it's at
+<http://czyborra.com/unicode/characters.html>.  Czyborra has written a number of
+other Unicode-related documentation, available from <http://www.cyzborra.com>.
+
+Two other good introductory articles were written by Joel Spolsky
+<http://www.joelonsoftware.com/articles/Unicode.html> and Jason Orendorff
+<http://www.jorendorff.com/articles/unicode/>.  If this introduction didn't make
+things clear to you, you should try reading one of these alternate articles
+before continuing.
+
+Wikipedia entries are often helpful; see the entries for "character encoding"
+<http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
+<http://en.wikipedia.org/wiki/UTF-8>, for example.
+
+
+Python's Unicode Support
+========================
+
+Now that you've learned the rudiments of Unicode, we can look at Python's
+Unicode features.
+
+
+The Unicode Type
+----------------
+
+Unicode strings are expressed as instances of the :class:`unicode` type, one of
+Python's repertoire of built-in types.  It derives from an abstract type called
+:class:`basestring`, which is also an ancestor of the :class:`str` type; you can
+therefore check if a value is a string type with ``isinstance(value,
+basestring)``.  Under the hood, Python represents Unicode strings as either 16-
+or 32-bit integers, depending on how the Python interpreter was compiled.
+
+The :func:`unicode` constructor has the signature ``unicode(string[, encoding,
+errors])``.  All of its arguments should be 8-bit strings.  The first argument
+is converted to Unicode using the specified encoding; if you leave off the
+``encoding`` argument, the ASCII encoding is used for the conversion, so
+characters greater than 127 will be treated as errors::
+
+    >>> unicode('abcdef')
+    u'abcdef'
+    >>> s = unicode('abcdef')
+    >>> type(s)
+    <type 'unicode'>
+    >>> unicode('abcdef' + chr(255))
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: 
+                        ordinal not in range(128)
+
+The ``errors`` argument specifies the response when the input string can't be
+converted according to the encoding's rules.  Legal values for this argument are
+'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add U+FFFD,
+'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
+Unicode result).  The following examples show the differences::
+
+    >>> unicode('\x80abc', errors='strict')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: 
+                        ordinal not in range(128)
+    >>> unicode('\x80abc', errors='replace')
+    u'\ufffdabc'
+    >>> unicode('\x80abc', errors='ignore')
+    u'abc'
+
+Encodings are specified as strings containing the encoding's name.  Python 2.4
+comes with roughly 100 different encodings; see the Python Library Reference at
+<http://docs.python.org/lib/standard-encodings.html> for a list.  Some encodings
+have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
+synonyms for the same encoding.
+
+One-character Unicode strings can also be created with the :func:`unichr`
+built-in function, which takes integers and returns a Unicode string of length 1
+that contains the corresponding code point.  The reverse operation is the
+built-in :func:`ord` function that takes a one-character Unicode string and
+returns the code point value::
+
+    >>> unichr(40960)
+    u'\ua000'
+    >>> ord(u'\ua000')
+    40960
+
+Instances of the :class:`unicode` type have many of the same methods as the
+8-bit string type for operations such as searching and formatting::
+
+    >>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
+    >>> s.count('e')
+    5
+    >>> s.find('feather')
+    9
+    >>> s.find('bird')
+    -1
+    >>> s.replace('feather', 'sand')
+    u'Was ever sand so lightly blown to and fro as this multitude?'
+    >>> s.upper()
+    u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
+
+Note that the arguments to these methods can be Unicode strings or 8-bit
+strings.  8-bit strings will be converted to Unicode before carrying out the
+operation; Python's default ASCII encoding will be used, so characters greater
+than 127 will cause an exception::
+
+    >>> s.find('Was\x9f')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
+    >>> s.find(u'Was\x9f')
+    -1
+
+Much Python code that operates on strings will therefore work with Unicode
+strings without requiring any changes to the code.  (Input and output code needs
+more updating for Unicode; more on this later.)
+
+Another important method is ``.encode([encoding], [errors='strict'])``, which
+returns an 8-bit string version of the Unicode string, encoded in the requested
+encoding.  The ``errors`` parameter is the same as the parameter of the
+``unicode()`` constructor, with one additional possibility; as well as 'strict',
+'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which uses XML's
+character references.  The following example shows the different results::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)
+    >>> u.encode('utf-8')
+    '\xea\x80\x80abcd\xde\xb4'
+    >>> u.encode('ascii')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
+    >>> u.encode('ascii', 'ignore')
+    'abcd'
+    >>> u.encode('ascii', 'replace')
+    '?abcd?'
+    >>> u.encode('ascii', 'xmlcharrefreplace')
+    '&#40960;abcd&#1972;'
+
+Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that
+interprets the string using the given encoding::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
+    >>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
+    >>> type(utf8_version), utf8_version
+    (<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
+    >>> u2 = utf8_version.decode('utf-8')            # Decode using UTF-8
+    >>> u == u2                                      # The two strings match
+    True
+ 
+The low-level routines for registering and accessing the available encodings are
+found in the :mod:`codecs` module.  However, the encoding and decoding functions
+returned by this module are usually more low-level than is comfortable, so I'm
+not going to describe the :mod:`codecs` module here.  If you need to implement a
+completely new encoding, you'll need to learn about the :mod:`codecs` module
+interfaces, but implementing encodings is a specialized task that also won't be
+covered here.  Consult the Python documentation to learn more about this module.
+
+The most commonly used part of the :mod:`codecs` module is the
+:func:`codecs.open` function which will be discussed in the section on input and
+output.
+            
+            
+Unicode Literals in Python Source Code
+--------------------------------------
+
+In Python source code, Unicode literals are written as strings prefixed with the
+'u' or 'U' character: ``u'abcdefghijk'``.  Specific code points can be written
+using the ``\u`` escape sequence, which is followed by four hex digits giving
+the code point.  The ``\U`` escape sequence is similar, but expects 8 hex
+digits, not 4.
+
+Unicode literals can also use the same escape sequences as 8-bit strings,
+including ``\x``, but ``\x`` only takes two hex digits so it can't express an
+arbitrary code point.  Octal escapes can go up to U+01ff, which is octal 777.
+
+::
+
+    >>> s = u"a\xac\u1234\u20ac\U00008000"
+               ^^^^ two-digit hex escape
+                   ^^^^^^ four-digit Unicode escape 
+                               ^^^^^^^^^^ eight-digit Unicode escape
+    >>> for c in s:  print ord(c),
+    ... 
+    97 172 4660 8364 32768
+
+Using escape sequences for code points greater than 127 is fine in small doses,
+but becomes an annoyance if you're using many accented characters, as you would
+in a program with messages in French or some other accent-using language.  You
+can also assemble strings using the :func:`unichr` built-in function, but this is
+even more tedious.
+
+Ideally, you'd want to be able to write literals in your language's natural
+encoding.  You could then edit Python source code with your favorite editor
+which would display the accented characters naturally, and have the right
+characters used at runtime.
+
+Python supports writing Unicode literals in any encoding, but you have to
+declare the encoding being used.  This is done by including a special comment as
+either the first or second line of the source file::
+
+    #!/usr/bin/env python
+    # -*- coding: latin-1 -*-
+    
+    u = u'abcdé'
+    print ord(u[-1])
+    
+The syntax is inspired by Emacs's notation for specifying variables local to a
+file.  Emacs supports many different variables, but Python only supports
+'coding'.  The ``-*-`` symbols indicate that the comment is special; within
+them, you must supply the name ``coding`` and the name of your chosen encoding,
+separated by ``':'``.
+
+If you don't include such a comment, the default encoding used will be ASCII.
+Versions of Python before 2.4 were Euro-centric and assumed Latin-1 as a default
+encoding for string literals; in Python 2.4, characters greater than 127 still
+work but result in a warning.  For example, the following program has no
+encoding declaration::
+
+    #!/usr/bin/env python
+    u = u'abcdé'
+    print ord(u[-1])
+
+When you run it with Python 2.4, it will output the following warning::
+
+    amk:~$ python p263.py
+    sys:1: DeprecationWarning: Non-ASCII character '\xe9' 
+         in file p263.py on line 2, but no encoding declared; 
+         see http://www.python.org/peps/pep-0263.html for details
+  
+
+Unicode Properties
+------------------
+
+The Unicode specification includes a database of information about code points.
+For each code point that's defined, the information includes the character's
+name, its category, the numeric value if applicable (Unicode has characters
+representing the Roman numerals and fractions such as one-third and
+four-fifths).  There are also properties related to the code point's use in
+bidirectional text and other display-related properties.
+
+The following program displays some information about several characters, and
+prints the numeric value of one particular character::
+
+    import unicodedata
+    
+    u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
+    
+    for i, c in enumerate(u):
+        print i, '%04x' % ord(c), unicodedata.category(c),
+        print unicodedata.name(c)
+    
+    # Get numeric value of second character
+    print unicodedata.numeric(u[1])
+
+When run, this prints::
+
+    0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
+    1 0bf2 No TAMIL NUMBER ONE THOUSAND
+    2 0f84 Mn TIBETAN MARK HALANTA
+    3 1770 Lo TAGBANWA LETTER SA
+    4 33af So SQUARE RAD OVER S SQUARED
+    1000.0
+
+The category codes are abbreviations describing the nature of the character.
+These are grouped into categories such as "Letter", "Number", "Punctuation", or
+"Symbol", which in turn are broken up into subcategories.  To take the codes
+from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means
+"Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
+other".  See
+<http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values> for a
+list of category codes.
+
+References
+----------
+
+The Unicode and 8-bit string types are described in the Python library reference
+at :ref:`typesseq`.
+
+The documentation for the :mod:`unicodedata` module.
+
+The documentation for the :mod:`codecs` module.
+
+Marc-André Lemburg gave a presentation at EuroPython 2002 titled "Python and
+Unicode".  A PDF version of his slides is available at
+<http://www.egenix.com/files/python/Unicode-EPC2002-Talk.pdf>, and is an
+excellent overview of the design of Python's Unicode features.
+
+
+Reading and Writing Unicode Data
+================================
+
+Once you've written some code that works with Unicode data, the next problem is
+input/output.  How do you get Unicode strings into your program, and how do you
+convert Unicode into a form suitable for storage or transmission?
+
+It's possible that you may not need to do anything depending on your input
+sources and output destinations; you should check whether the libraries used in
+your application support Unicode natively.  XML parsers often return Unicode
+data, for example.  Many relational databases also support Unicode-valued
+columns and can return Unicode values from an SQL query.
+
+Unicode data is usually converted to a particular encoding before it gets
+written to disk or sent over a socket.  It's possible to do all the work
+yourself: open a file, read an 8-bit string from it, and convert the string with
+``unicode(str, encoding)``.  However, the manual approach is not recommended.
+
+One problem is the multi-byte nature of encodings; one Unicode character can be
+represented by several bytes.  If you want to read the file in arbitrary-sized
+chunks (say, 1K or 4K), you need to write error-handling code to catch the case
+where only part of the bytes encoding a single Unicode character are read at the
+end of a chunk.  One solution would be to read the entire file into memory and
+then perform the decoding, but that prevents you from working with files that
+are extremely large; if you need to read a 2Gb file, you need 2Gb of RAM.
+(More, really, since for at least a moment you'd need to have both the encoded
+string and its Unicode version in memory.)
+
+The solution would be to use the low-level decoding interface to catch the case
+of partial coding sequences.  The work of implementing this has already been
+done for you: the :mod:`codecs` module includes a version of the :func:`open`
+function that returns a file-like object that assumes the file's contents are in
+a specified encoding and accepts Unicode parameters for methods such as
+``.read()`` and ``.write()``.
+
+The function's parameters are ``open(filename, mode='rb', encoding=None,
+errors='strict', buffering=1)``.  ``mode`` can be ``'r'``, ``'w'``, or ``'a'``,
+just like the corresponding parameter to the regular built-in ``open()``
+function; add a ``'+'`` to update the file.  ``buffering`` is similarly parallel
+to the standard function's parameter.  ``encoding`` is a string giving the
+encoding to use; if it's left as ``None``, a regular Python file object that
+accepts 8-bit strings is returned.  Otherwise, a wrapper object is returned, and
+data written to or read from the wrapper object will be converted as needed.
+``errors`` specifies the action for encoding errors and can be one of the usual
+values of 'strict', 'ignore', and 'replace'.
+
+Reading Unicode from a file is therefore simple::
+
+    import codecs
+    f = codecs.open('unicode.rst', encoding='utf-8')
+    for line in f:
+        print repr(line)
+
+It's also possible to open files in update mode, allowing both reading and
+writing::
+
+    f = codecs.open('test', encoding='utf-8', mode='w+')
+    f.write(u'\u4500 blah blah blah\n')
+    f.seek(0)
+    print repr(f.readline()[:1])
+    f.close()
+
+Unicode character U+FEFF is used as a byte-order mark (BOM), and is often
+written as the first character of a file in order to assist with autodetection
+of the file's byte ordering.  Some encodings, such as UTF-16, expect a BOM to be
+present at the start of a file; when such an encoding is used, the BOM will be
+automatically written as the first character and will be silently dropped when
+the file is read.  There are variants of these encodings, such as 'utf-16-le'
+and 'utf-16-be' for little-endian and big-endian encodings, that specify one
+particular byte ordering and don't skip the BOM.
+
+
+Unicode filenames
+-----------------
+
+Most of the operating systems in common use today support filenames that contain
+arbitrary Unicode characters.  Usually this is implemented by converting the
+Unicode string into some encoding that varies depending on the system.  For
+example, MacOS X uses UTF-8 while Windows uses a configurable encoding; on
+Windows, Python uses the name "mbcs" to refer to whatever the currently
+configured encoding is.  On Unix systems, there will only be a filesystem
+encoding if you've set the ``LANG`` or ``LC_CTYPE`` environment variables; if
+you haven't, the default encoding is ASCII.
+
+The :func:`sys.getfilesystemencoding` function returns the encoding to use on
+your current system, in case you want to do the encoding manually, but there's
+not much reason to bother.  When opening a file for reading or writing, you can
+usually just provide the Unicode string as the filename, and it will be
+automatically converted to the right encoding for you::
+
+    filename = u'filename\u4500abc'
+    f = open(filename, 'w')
+    f.write('blah\n')
+    f.close()
+
+Functions in the :mod:`os` module such as :func:`os.stat` will also accept Unicode
+filenames.
+
+:func:`os.listdir`, which returns filenames, raises an issue: should it return
+the Unicode version of filenames, or should it return 8-bit strings containing
+the encoded versions?  :func:`os.listdir` will do both, depending on whether you
+provided the directory path as an 8-bit string or a Unicode string.  If you pass
+a Unicode string as the path, filenames will be decoded using the filesystem's
+encoding and a list of Unicode strings will be returned, while passing an 8-bit
+path will return the 8-bit versions of the filenames.  For example, assuming the
+default filesystem encoding is UTF-8, running the following program::
+
+	fn = u'filename\u4500abc'
+	f = open(fn, 'w')
+	f.close()
+
+	import os
+	print os.listdir('.')
+	print os.listdir(u'.')
+
+will produce the following output::
+
+	amk:~$ python t.py
+	['.svn', 'filename\xe4\x94\x80abc', ...]
+	[u'.svn', u'filename\u4500abc', ...]
+
+The first list contains UTF-8-encoded filenames, and the second list contains
+the Unicode versions.
+
+
+	
+Tips for Writing Unicode-aware Programs
+---------------------------------------
+
+This section provides some suggestions on writing software that deals with
+Unicode.
+
+The most important tip is:
+
+    Software should only work with Unicode strings internally, converting to a
+    particular encoding on output.
+
+If you attempt to write processing functions that accept both Unicode and 8-bit
+strings, you will find your program vulnerable to bugs wherever you combine the
+two different kinds of strings.  Python's default encoding is ASCII, so whenever
+a character with an ASCII value > 127 is in the input data, you'll get a
+:exc:`UnicodeDecodeError` because that character can't be handled by the ASCII
+encoding.
+
+It's easy to miss such problems if you only test your software with data that
+doesn't contain any accents; everything will seem to work, but there's actually
+a bug in your program waiting for the first user who attempts to use characters
+> 127.  A second tip, therefore, is:
+
+    Include characters > 127 and, even better, characters > 255 in your test
+    data.
+
+When using data coming from a web browser or some other untrusted source, a
+common technique is to check for illegal characters in a string before using the
+string in a generated command line or storing it in a database.  If you're doing
+this, be careful to check the string once it's in the form that will be used or
+stored; it's possible for encodings to be used to disguise characters.  This is
+especially true if the input data also specifies the encoding; many encodings
+leave the commonly checked-for characters alone, but Python includes some
+encodings such as ``'base64'`` that modify every single character.
+
+For example, let's say you have a content management system that takes a Unicode
+filename, and you want to disallow paths with a '/' character.  You might write
+this code::
+
+    def read_file (filename, encoding):
+        if '/' in filename:
+            raise ValueError("'/' not allowed in filenames")
+        unicode_name = filename.decode(encoding)
+        f = open(unicode_name, 'r')
+        # ... return contents of file ...
+        
+However, if an attacker could specify the ``'base64'`` encoding, they could pass
+``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string
+``'/etc/passwd'``, to read a system file.  The above code looks for ``'/'``
+characters in the encoded form and misses the dangerous character in the
+resulting decoded form.
+
+References
+----------
+
+The PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
+Applications in Python" are available at
+<http://www.egenix.com/files/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>
+and discuss questions of character encodings as well as how to internationalize
+and localize an application.
+
+
+Revision History and Acknowledgements
+=====================================
+
+Thanks to the following people who have noted errors or offered suggestions on
+this article: Nicholas Bastin, Marius Gedminas, Kent Johnson, Ken Krugler,
+Marc-André Lemburg, Martin von Löwis, Chad Whitacre.
+
+Version 1.0: posted August 5 2005.
+
+Version 1.01: posted August 7 2005.  Corrects factual and markup errors; adds
+several links.
+
+Version 1.02: posted August 16 2005.  Corrects factual errors.
+
+
+.. comment Additional topic: building Python w/ UCS2 or UCS4 support
+.. comment Describe obscure -U switch somewhere?
+.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter
+
+.. comment 
+   Original outline:
+
+   - [ ] Unicode introduction
+       - [ ] ASCII
+       - [ ] Terms
+	   - [ ] Character
+	   - [ ] Code point
+	 - [ ] Encodings
+	    - [ ] Common encodings: ASCII, Latin-1, UTF-8
+       - [ ] Unicode Python type
+	   - [ ] Writing unicode literals
+	       - [ ] Obscurity: -U switch
+	   - [ ] Built-ins
+	       - [ ] unichr()
+	       - [ ] ord()
+	       - [ ] unicode() constructor
+	   - [ ] Unicode type
+	       - [ ] encode(), decode() methods
+       - [ ] Unicodedata module for character properties
+       - [ ] I/O
+	   - [ ] Reading/writing Unicode data into files
+	       - [ ] Byte-order marks
+	   - [ ] Unicode filenames
+       - [ ] Writing Unicode programs
+	   - [ ] Do everything in Unicode
+	   - [ ] Declaring source code encodings (PEP 263)
+       - [ ] Other issues
+	   - [ ] Building Python (UCS2, UCS4)

Added: doctools/trunk/Doc-3k/howto/urllib2.rst
==============================================================================
--- (empty file)
+++ doctools/trunk/Doc-3k/howto/urllib2.rst	Thu Aug  2 23:31:34 2007
@@ -0,0 +1,573 @@
+************************************************
+  HOWTO Fetch Internet Resources Using urllib2
+************************************************
+
+:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
+
+.. note::
+
+    There is an French translation of an earlier revision of this
+    HOWTO, available at `urllib2 - Le Manuel manquant
+    <http://www.voidspace/python/articles/urllib2_francais.shtml>`_.
+
+ 
+
+Introduction
+============
+
+.. sidebar:: Related Articles
+
+    You may also find useful the following article on fetching web resources
+    with Python :
+    
+    * `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
+    
+        A tutorial on *Basic Authentication*, with examples in Python.
+
+**urllib2** is a `Python <http://www.python.org>`_ module for fetching URLs
+(Uniform Resource Locators). It offers a very simple interface, in the form of
+the *urlopen* function. This is capable of fetching URLs using a variety of
+different protocols. It also offers a slightly more complex interface for
+handling common situations - like basic authentication, cookies, proxies and so
+on. These are provided by objects called handlers and openers.
+
+urllib2 supports fetching URLs for many "URL schemes" (identified by the string
+before the ":" in URL - for example "ftp" is the URL scheme of
+"ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP).
+This tutorial focuses on the most common case, HTTP.
+
+For straightforward situations *urlopen* is very easy to use. But as soon as you
+encounter errors or non-trivial cases when opening HTTP URLs, you will need some
+understanding of the HyperText Transfer Protocol. The most comprehensive and
+authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
+not intended to be easy to read. This HOWTO aims to illustrate using *urllib2*,
+with enough detail about HTTP to help you through. It is not intended to replace
+the :mod:`urllib2` docs, but is supplementary to them.
+
+
+Fetching URLs
+=============
+
+The simplest way to use urllib2 is as follows::
+
+    import urllib2
+    response = urllib2.urlopen('http://python.org/')
+    html = response.read()
+
+Many uses of urllib2 will be that simple (note that instead of an 'http:' URL we
+could have used an URL starting with 'ftp:', 'file:', etc.).  However, it's the
+purpose of this tutorial to explain the more complicated cases, concentrating on
+HTTP.
+
+HTTP is based on requests and responses - the client makes requests and servers
+send responses. urllib2 mirrors this with a ``Request`` object which represents
+the HTTP request you are making. In its simplest form you create a Request
+object that specifies the URL you want to fetch. Calling ``urlopen`` with this
+Request object returns a response object for the URL requested. This response is
+a file-like object, which means you can for example call ``.read()`` on the
+response::
+
+    import urllib2
+
+    req = urllib2.Request('http://www.voidspace.org.uk')
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+Note that urllib2 makes use of the same Request interface to handle all URL
+schemes.  For example, you can make an FTP request like so::
+
+    req = urllib2.Request('ftp://example.com/')
+
+In the case of HTTP, there are two extra things that Request objects allow you
+to do: First, you can pass data to be sent to the server.  Second, you can pass
+extra information ("metadata") *about* the data or the about request itself, to
+the server - this information is sent as HTTP "headers".  Let's look at each of
+these in turn.
+
+Data
+----
+
+Sometimes you want to send data to a URL (often the URL will refer to a CGI
+(Common Gateway Interface) script [#]_ or other web application). With HTTP,
+this is often done using what's known as a **POST** request. This is often what
+your browser does when you submit a HTML form that you filled in on the web. Not
+all POSTs have to come from forms: you can use a POST to transmit arbitrary data
+to your own application. In the common case of HTML forms, the data needs to be
+encoded in a standard way, and then passed to the Request object as the ``data``
+argument. The encoding is done using a function from the ``urllib`` library
+*not* from ``urllib2``. ::
+
+    import urllib
+    import urllib2  
+
+    url = 'http://www.someserver.com/cgi-bin/register.cgi'
+    values = {'name' : 'Michael Foord',
+              'location' : 'Northampton',
+              'language' : 'Python' }
+
+    data = urllib.urlencode(values)
+    req = urllib2.Request(url, data)
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+Note that other encodings are sometimes required (e.g. for file upload from HTML
+forms - see `HTML Specification, Form Submission
+<http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
+details).
+
+If you do not pass the ``data`` argument, urllib2 uses a **GET** request. One
+way in which GET and POST requests differ is that POST requests often have
+"side-effects": they change the state of the system in some way (for example by
+placing an order with the website for a hundredweight of tinned spam to be
+delivered to your door).  Though the HTTP standard makes it clear that POSTs are
+intended to *always* cause side-effects, and GET requests *never* to cause
+side-effects, nothing prevents a GET request from having side-effects, nor a
+POST requests from having no side-effects. Data can also be passed in an HTTP
+GET request by encoding it in the URL itself.
+
+This is done as follows::
+
+    >>> import urllib2
+    >>> import urllib
+    >>> data = {}
+    >>> data['name'] = 'Somebody Here'
+    >>> data['location'] = 'Northampton'
+    >>> data['language'] = 'Python'
+    >>> url_values = urllib.urlencode(data)
+    >>> print url_values
+    name=Somebody+Here&language=Python&location=Northampton
+    >>> url = 'http://www.example.com/example.cgi'
+    >>> full_url = url + '?' + url_values
+    >>> data = urllib2.open(full_url)
+
+Notice that the full URL is created by adding a ``?`` to the URL, followed by
+the encoded values.
+
+Headers
+-------
+
+We'll discuss here one particular HTTP header, to illustrate how to add headers
+to your HTTP request.
+
+Some websites [#]_ dislike being browsed by programs, or send different versions
+to different browsers [#]_ . By default urllib2 identifies itself as
+``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
+numbers of the Python release,
+e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
+not work. The way a browser identifies itself is through the
+``User-Agent`` header [#]_. When you create a Request object you can
+pass a dictionary of headers in. The following example makes the same
+request as above, but identifies itself as a version of Internet
+Explorer [#]_. ::
+
+    import urllib
+    import urllib2  
+    
+    url = 'http://www.someserver.com/cgi-bin/register.cgi'
+    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
+    values = {'name' : 'Michael Foord',
+              'location' : 'Northampton',
+              'language' : 'Python' }
+    headers = { 'User-Agent' : user_agent }
+    
+    data = urllib.urlencode(values)
+    req = urllib2.Request(url, data, headers)
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+The response also has two useful methods. See the section on `info and geturl`_
+which comes after we have a look at what happens when things go wrong.
+
+
+Handling Exceptions
+===================
+
+*urlopen* raises ``URLError`` when it cannot handle a response (though as usual
+with Python APIs, builtin exceptions such as ValueError, TypeError etc. may also
+be raised).
+
+``HTTPError`` is the subclass of ``URLError`` raised in the specific case of
+HTTP URLs.
+
+URLError
+--------
+
+Often, URLError is raised because there is no network connection (no route to
+the specified server), or the specified server doesn't exist.  In this case, the
+exception raised will have a 'reason' attribute, which is a tuple containing an
+error code and a text error message.
+
+e.g. ::
+
+    >>> req = urllib2.Request('http://www.pretend_server.org')
+    >>> try: urllib2.urlopen(req)
+    >>> except URLError, e:
+    >>>    print e.reason
+    >>>
+    (4, 'getaddrinfo failed')
+
+
+HTTPError
+---------
+
+Every HTTP response from the server contains a numeric "status code". Sometimes
+the status code indicates that the server is unable to fulfil the request. The
+default handlers will handle some of these responses for you (for example, if
+the response is a "redirection" that requests the client fetch the document from
+a different URL, urllib2 will handle that for you). For those it can't handle,
+urlopen will raise an ``HTTPError``. Typical errors include '404' (page not
+found), '403' (request forbidden), and '401' (authentication required).
+
+See section 10 of RFC 2616 for a reference on all the HTTP error codes.
+
+The ``HTTPError`` instance raised will have an integer 'code' attribute, which
+corresponds to the error sent by the server.
+
+Error Codes
+~~~~~~~~~~~
+
+Because the default handlers handle redirects (codes in the 300 range), and
+codes in the 100-299 range indicate success, you will usually only see error
+codes in the 400-599 range.
+
+``BaseHTTPServer.BaseHTTPRequestHandler.responses`` is a useful dictionary of
+response codes in that shows all the response codes used by RFC 2616. The
+dictionary is reproduced here for convenience ::
+
+    # Table mapping response codes to messages; entries have the
+    # form {code: (shortmessage, longmessage)}.
+    responses = {
+        100: ('Continue', 'Request received, please continue'),
+        101: ('Switching Protocols',
+              'Switching to new protocol; obey Upgrade header'),
+
+        200: ('OK', 'Request fulfilled, document follows'),
+        201: ('Created', 'Document created, URL follows'),
+        202: ('Accepted',
+              'Request accepted, processing continues off-line'),
+        203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
+        204: ('No Content', 'Request fulfilled, nothing follows'),
+        205: ('Reset Content', 'Clear input form for further input.'),
+        206: ('Partial Content', 'Partial content follows.'),
+
+        300: ('Multiple Choices',
+              'Object has several resources -- see URI list'),
+        301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
+        302: ('Found', 'Object moved temporarily -- see URI list'),
+        303: ('See Other', 'Object moved -- see Method and URL list'),
+        304: ('Not Modified',
+              'Document has not changed since given time'),
+        305: ('Use Proxy',
+              'You must use proxy specified in Location to access this '
+              'resource.'),
+        307: ('Temporary Redirect',
+              'Object moved temporarily -- see URI list'),
+
+        400: ('Bad Request',
+              'Bad request syntax or unsupported method'),
+        401: ('Unauthorized',
+              'No permission -- see authorization schemes'),
+        402: ('Payment Required',
+              'No payment -- see charging schemes'),
+        403: ('Forbidden',
+              'Request forbidden -- authorization will not help'),
+        404: ('Not Found', 'Nothing matches the given URI'),
+        405: ('Method Not Allowed',
+              'Specified method is invalid for this server.'),
+        406: ('Not Acceptable', 'URI not available in preferred format.'),
+        407: ('Proxy Authentication Required', 'You must authenticate with '
+              'this proxy before proceeding.'),
+        408: ('Request Timeout', 'Request timed out; try again later.'),
+        409: ('Conflict', 'Request conflict.'),
+        410: ('Gone',
+              'URI no longer exists and has been permanently removed.'),
+        411: ('Length Required', 'Client must specify Content-Length.'),
+        412: ('Precondition Failed', 'Precondition in headers is false.'),
+        413: ('Request Entity Too Large', 'Entity is too large.'),
+        414: ('Request-URI Too Long', 'URI is too long.'),
+        415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
+        416: ('Requested Range Not Satisfiable',
+              'Cannot satisfy request range.'),
+        417: ('Expectation Failed',
+              'Expect condition could not be satisfied.'),
+
+        500: ('Internal Server Error', 'Server got itself in trouble'),
+        501: ('Not Implemented',
+              'Server does not support this operation'),
+        502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
+        503: ('Service Unavailable',
+              'The server cannot process the request due to a high load'),
+        504: ('Gateway Timeout',
+              'The gateway server did not receive a timely response'),
+        505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
+        }
+
+When an error is raised the server responds by returning an HTTP error code
+*and* an error page. You can use the ``HTTPError`` instance as a response on the
+page returned. This means that as well as the code attribute, it also has read,
+geturl, and info, methods. ::
+
+    >>> req = urllib2.Request('http://www.python.org/fish.html')
+    >>> try: 
+    >>>     urllib2.urlopen(req)
+    >>> except URLError, e:
+    >>>     print e.code
+    >>>     print e.read()
+    >>> 
+    404
+    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
+        "http://www.w3.org/TR/html4/loose.dtd">
+    <?xml-stylesheet href="./css/ht2html.css" 
+        type="text/css"?>
+    <html><head><title>Error 404: File Not Found</title> 
+    ...... etc...
+
+Wrapping it Up
+--------------
+
+So if you want to be prepared for ``HTTPError`` *or* ``URLError`` there are two
+basic approaches. I prefer the second approach.
+
+Number 1
+~~~~~~~~
+
+::
+
+
+    from urllib2 import Request, urlopen, URLError, HTTPError
+    req = Request(someurl)
+    try:
+        response = urlopen(req)
+    except HTTPError, e:
+        print 'The server couldn\'t fulfill the request.'
+        print 'Error code: ', e.code
+    except URLError, e:
+        print 'We failed to reach a server.'
+        print 'Reason: ', e.reason
+    else:
+        # everything is fine
+
+
+.. note::
+
+    The ``except HTTPError`` *must* come first, otherwise ``except URLError``
+    will *also* catch an ``HTTPError``.
+
+Number 2
+~~~~~~~~
+
+::
+
+    from urllib2 import Request, urlopen, URLError
+    req = Request(someurl)
+    try:
+        response = urlopen(req)
+    except URLError, e:
+        if hasattr(e, 'reason'):
+            print 'We failed to reach a server.'
+            print 'Reason: ', e.reason
+        elif hasattr(e, 'code'):
+            print 'The server couldn\'t fulfill the request.'
+            print 'Error code: ', e.code
+    else:
+        # everything is fine
+        
+
+info and geturl
+===============
+
+The response returned by urlopen (or the ``HTTPError`` instance) has two useful
+methods ``info`` and ``geturl``.
+
+**geturl** - this returns the real URL of the page fetched. This is useful
+because ``urlopen`` (or the opener object used) may have followed a
+redirect. The URL of the page fetched may not be the same as the URL requested.
+
+**info** - this returns a dictionary-like object that describes the page
+fetched, particularly the headers sent by the server. It is currently an
+``httplib.HTTPMessage`` instance.
+
+Typical headers include 'Content-length', 'Content-type', and so on. See the
+`Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_
+for a useful listing of HTTP headers with brief explanations of their meaning
+and use.
+
+
+Openers and Handlers
+====================
+
+When you fetch a URL you use an opener (an instance of the perhaps
+confusingly-named :class:`urllib2.OpenerDirector`). Normally we have been using
+the default opener - via ``urlopen`` - but you can create custom
+openers. Openers use handlers. All the "heavy lifting" is done by the
+handlers. Each handler knows how to open URLs for a particular URL scheme (http,
+ftp, etc.), or how to handle an aspect of URL opening, for example HTTP
+redirections or HTTP cookies.
+
+You will want to create openers if you want to fetch URLs with specific handlers
+installed, for example to get an opener that handles cookies, or to get an
+opener that does not handle redirections.
+
+To create an opener, instantiate an ``OpenerDirector``, and then call
+``.add_handler(some_handler_instance)`` repeatedly.
+
+Alternatively, you can use ``build_opener``, which is a convenience function for
+creating opener objects with a single function call.  ``build_opener`` adds
+several handlers by default, but provides a quick way to add more and/or
+override the default handlers.
+
+Other sorts of handlers you might want to can handle proxies, authentication,
+and other common but slightly specialised situations.
+
+``install_opener`` can be used to make an ``opener`` object the (global) default
+opener. This means that calls to ``urlopen`` will use the opener you have
+installed.
+
+Opener objects have an ``open`` method, which can be called directly to fetch
+urls in the same way as the ``urlopen`` function: there's no need to call
+``install_opener``, except as a convenience.
+
+
+Basic Authentication
+====================
+
+To illustrate creating and installing a handler we will use the
+``HTTPBasicAuthHandler``. For a more detailed discussion of this subject -
+including an explanation of how Basic Authentication works - see the `Basic
+Authentication Tutorial
+<http://www.voidspace.org.uk/python/articles/authentication.shtml>`_.
+
+When authentication is required, the server sends a header (as well as the 401
+error code) requesting authentication.  This specifies the authentication scheme
+and a 'realm'. The header looks like : ``Www-authenticate: SCHEME
+realm="REALM"``.
+
+e.g. :: 
+
+    Www-authenticate: Basic realm="cPanel Users"
+
+
+The client should then retry the request with the appropriate name and password
+for the realm included as a header in the request. This is 'basic
+authentication'. In order to simplify this process we can create an instance of
+``HTTPBasicAuthHandler`` and an opener to use this handler.
+
+The ``HTTPBasicAuthHandler`` uses an object called a password manager to handle
+the mapping of URLs and realms to passwords and usernames. If you know what the
+realm is (from the authentication header sent by the server), then you can use a
+``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In that
+case, it is convenient to use ``HTTPPasswordMgrWithDefaultRealm``. This allows
+you to specify a default username and password for a URL. This will be supplied
+in the absence of you providing an alternative combination for a specific
+realm. We indicate this by providing ``None`` as the realm argument to the
+``add_password`` method.
+
+The top-level URL is the first URL that requires authentication. URLs "deeper"
+than the URL you pass to .add_password() will also match. ::
+
+    # create a password manager
+    password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()                        
+
+    # Add the username and password.
+    # If we knew the realm, we could use it instead of ``None``.
+    top_level_url = "http://example.com/foo/"
+    password_mgr.add_password(None, top_level_url, username, password)
+
+    handler = urllib2.HTTPBasicAuthHandler(password_mgr)                            
+
+    # create "opener" (OpenerDirector instance)
+    opener = urllib2.build_opener(handler)                       
+
+    # use the opener to fetch a URL
+    opener.open(a_url)      
+
+    # Install the opener.
+    # Now all calls to urllib2.urlopen use our opener.
+    urllib2.install_opener(opener)                               
+
+.. note::
+
+    In the above example we only supplied our ``HHTPBasicAuthHandler`` to
+    ``build_opener``. By default openers have the handlers for normal situations
+    -- ``ProxyHandler``, ``UnknownHandler``, ``HTTPHandler``,
+    ``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``,
+    ``FileHandler``, ``HTTPErrorProcessor``.
+
+``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme
+component and the hostname and optionally the port number)
+e.g. "http://example.com/" *or* an "authority" (i.e. the hostname,
+optionally including the port number) e.g. "example.com" or "example.com:8080"
+(the latter example includes a port number).  The authority, if present, must
+NOT contain the "userinfo" component - for example "joe at password:example.com" is
+not correct.
+
+
+Proxies
+=======
+
+**urllib2** will auto-detect your proxy settings and use those. This is through
+the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
+a good thing, but there are occasions when it may not be helpful [#]_. One way
+to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
+is done using similar steps to setting up a `Basic Authentication`_ handler : ::
+
+    >>> proxy_support = urllib2.ProxyHandler({})
+    >>> opener = urllib2.build_opener(proxy_support)
+    >>> urllib2.install_opener(opener)
+
+.. note::
+
+    Currently ``urllib2`` *does not* support fetching of ``https`` locations
+    through a proxy. This can be a problem.
+
+Sockets and Layers
+==================
+
+The Python support for fetching resources from the web is layered. urllib2 uses
+the httplib library, which in turn uses the socket library.
+
+As of Python 2.3 you can specify how long a socket should wait for a response
+before timing out. This can be useful in applications which have to fetch web
+pages. By default the socket module has *no timeout* and can hang. Currently,
+the socket timeout is not exposed at the httplib or urllib2 levels.  However,
+you can set the default timeout globally for all sockets using ::
+
+    import socket
+    import urllib2
+
+    # timeout in seconds
+    timeout = 10
+    socket.setdefaulttimeout(timeout) 
+
+    # this call to urllib2.urlopen now uses the default timeout
+    # we have set in the socket module
+    req = urllib2.Request('http://www.voidspace.org.uk')
+    response = urllib2.urlopen(req)
+
+
+-------
+
+
+Footnotes
+=========
+
+This document was reviewed and revised by John Lee.
+
+.. [#] For an introduction to the CGI protocol see
+       `Writing Web Applications in Python <http://www.pyzine.com/Issue008/Section_Articles/article_CGIOne.html>`_. 
+.. [#] Like Google for example. The *proper* way to use google from a program
+       is to use `PyGoogle <http://pygoogle.sourceforge.net>`_ of course. See
+       `Voidspace Google <http://www.voidspace.org.uk/python/recipebook.shtml#google>`_
+       for some examples of using the Google API.
+.. [#] Browser sniffing is a very bad practise for website design - building
+       sites using web standards is much more sensible. Unfortunately a lot of
+       sites still send different versions to different browsers.
+.. [#] The user agent for MSIE 6 is
+       *'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'*
+.. [#] For details of more HTTP request headers, see
+       `Quick Reference to HTTP Headers`_.
+.. [#] In my case I have to use a proxy to access the internet at work. If you
+       attempt to fetch *localhost* URLs through this proxy it blocks them. IE
+       is set to use the proxy, which urllib2 picks up on. In order to test
+       scripts with a localhost server, I have to prevent urllib2 from using
+       the proxy.

Modified: doctools/trunk/sphinx/templates/index.html
==============================================================================
--- doctools/trunk/sphinx/templates/index.html	(original)
+++ doctools/trunk/sphinx/templates/index.html	Thu Aug  2 23:31:34 2007
@@ -24,6 +24,8 @@
          <span class="linkdescr">keep this under your pillow</span></p>
       <p class="biglink"><a class="biglink" href="{{ pathto("maclib/index.rst") }}">Macintosh Library Modules</a><br>
          <span class="linkdescr">this too, if you use a Macintosh</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("howto/index.rst") }}">Python HOWTOs</a><br>
+         <span class="linkdescr">in-depth documents on specific topics</span></p>
     </td><td width="50%">
       <p class="biglink"><a class="biglink" href="{{ pathto("extending/index.rst") }}">Extending and Embedding</a><br>
          <span class="linkdescr">tutorial for C/C++ programmers</span></p>