[Python-checkins] r47183 - sandbox/trunk/Doc/functional.rst
andrew.kuchling
python-checkins at python.org
Fri Jun 30 23:03:12 CEST 2006
Author: andrew.kuchling
Date: Fri Jun 30 23:03:11 2006
New Revision: 47183
Modified:
sandbox/trunk/Doc/functional.rst
Log:
Revision pass: lots of small edits; request feedback; remove LaTeX markup :) ; add small references section
Modified: sandbox/trunk/Doc/functional.rst
==============================================================================
--- sandbox/trunk/Doc/functional.rst (original)
+++ sandbox/trunk/Doc/functional.rst Fri Jun 30 23:03:11 2006
@@ -1,13 +1,14 @@
Functional Programming HOWTO
================================
-**Incomplete Draft**
+(This is a first draft. Please send comments/error
+reports/suggestions to amk at amk.ca.)
-In this document, we'll take a tour of Python's features that are
-well-suited to implementing programs in a functional style. After an
-introduction to the concepts of functional programming, we'll look at
-language features such as iterators and generators and relevant
-library modules such as the ``itertools`` and ``functools`` modules.
+In this document, we'll take a tour of Python's features suitable for
+implementing programs in a functional style. After an introduction to
+the concepts of functional programming, we'll look at language
+features such as iterators and generators and relevant library modules
+such as ``itertools`` and ``functools``.
Introduction
@@ -20,10 +21,11 @@
Programming languages support decomposing problems in several different
ways:
-* Most programming languages are **procedural**, meaning
- that programs are lists of instructions that tell the computer what to
- do with the program's input.
- C, Pascal, and even Unix shells are procedural languages.
+* Most programming languages are **procedural**:
+ programs are lists of instructions that tell the computer what to
+ do with the program's input.
+ C, Pascal, and even Unix shells are procedural languages.
+
* In **declarative** languages, you write a specification that describes
the problem to be solved, and the language implementation figures out
how to perform the computation efficiently. SQL is the declarative
@@ -31,43 +33,54 @@
the data set you want to retrieve, and the SQL engine decides whether to
scan tables or use indexes, which subclauses should be performed first,
etc.
-* **Object-oriented** programming is a style in which programs consist
- of collections of objects. Objects have internal state and support various
- methods that query or modify this internal state in some way. Smalltalk
- and Java are object-oriented languages. C++ and Python are languages
- that support object-oriented programming, but don't require it.
+
+* **Object-oriented** programs manipulate collections of objects.
+ Objects have internal state and support methods that query or modify
+ this internal state in some way. Smalltalk and Java are
+ object-oriented languages. C++ and Python are languages that
+ support object-oriented programming, but don't force the use
+ of object-oriented features.
+
* **Functional** programming decomposes a problem into a set of functions.
Ideally, functions only take inputs and produce outputs, and don't have any
internal state that affects the output produced for a given input.
Well-known functional languages include the ML family (Standard ML,
- OCaml, and other variants) and Haskell.
+ OCaml, and other variants) and Haskell.
-The designers of some computer languages have chosen one style of
-programming that they support best. This often makes it difficult to
-write programs in the language that use a different style. Other
-language designers try to produce multi-paradigm languages that
-support several different styles. Lisp, C++, and Python are
-multi-paradigm; you can write programs or libraries that are largely
-procedural, object-oriented, or functional. In a large program,
-different sections might be written in different styles; the GUI might
-be object-oriented while the processing logic is procedural or
-functional, for example.
+The designers of some computer languages have chosen one approach to
+programming that's emphasized. This often makes it difficult to
+write programs that use a different approach. Other languages are
+multi-paradigm languages that support several different approaches. Lisp,
+C++, and Python are multi-paradigm; you can write programs or
+libraries that are largely procedural, object-oriented, or functional
+in all of these languages. In a large program, different sections
+might be written using different approaches; the GUI might be object-oriented
+while the processing logic is procedural or functional, for example.
In a functional program, input flows through a set of functions. Each
function operates on its input and produces some output. Functional
-style avoids making functions have side effects that modify internal
+style frowns upon functions with side effects that modify internal
state or make other changes that aren't visible in the function's
-return value. Avoiding side effects means not using data structures
-that get updated as a program runs. Some languages are very strict
-about this and don't even have assignment statements such as ``a=3``
-or ``c = a + b``, but it's difficult to avoid all side effects; some
-Python code, such as a ``print`` statement or a ``time.sleep(1)``,
-returns nothing and is only called for its side effects.
+return value. Functions that have no side effects at all are
+called **purely functional**.
+Avoiding side effects means not using data structures
+that get updated as a program runs; every function's output
+must only depend on its input.
+
+Some languages are very strict about purity and don't even have
+assignment statements such as ``a=3`` or ``c = a + b``, but it's
+difficult to avoid all side effects. Printing to the screen or
+writing to a disk file are side effects, for example. For example, in
+Python a ``print`` statement or a ``time.sleep(1)`` both return no
+useful value; they're only called for their side effects of sending
+some text to the screen or pausing execution for a second.
Python programs written in functional style usually won't go to the
-extreme of avoiding all I/O or all assignments. The implementation of
-a function will still use assignments to local variables, but won't
-touch any global variables or have other side effects.
+extreme of avoiding all I/O or all assignments; instead, they'll
+provide a functional-appearing interface but will use non-functional
+features internally. For example, the implementation of a function
+will still use assignments to local variables, but won't modify global
+variables or have other side effects.
Functional programming can be considered the opposite of
object-oriented programming. Objects are little capsules containing
@@ -122,29 +135,30 @@
you use daily (the Python interpreter, your XML parser, your web
browser) could be proven correct. Even if you wrote down or generated
a proof, there would then be the question of verifying the proof;
-maybe you only ***think*** you've proved that the program correct.
+maybe there's an error in it, and you only ***think*** you've proved
+that the program correct.
Modularity
''''''''''''''''''''''
A more practical benefit of functional programming is that it forces
-you to break apart your problem into small pieces. It's easier to
-specify and write a small function that does one thing than a large
-function that performs a complicated transformation. Small functions
-are easier to read and to check for errors. A program that's
-structured into many small pieces is a **modular** program.
+you to break apart your problem into small pieces. Programs are more
+modular as a result. It's easier to specify and write a small
+function that does one thing than a large function that performs a
+complicated transformation. Small functions are also easier to read
+and to check for errors.
Ease of debugging and testing
''''''''''''''''''''''''''''''''''
-It's also easier to test and to debug a functional-style program.
+Testing and debugging a functional-style program is easier.
-It's easier to debug because functions are generally small and clearly
-specified. When a program doesn't work, each function is an interface
-point where you can check that the data is correct. You can look at
-the intermediate inputs and outputs to quickly isolate the function
-that's responsible for a bug.
+Debugging is simplified because functions are generally small and
+clearly specified. When a program doesn't work, each function is an
+interface point where you can check that the data are correct. You
+can look at the intermediate inputs and outputs to quickly isolate the
+function that's responsible for a bug.
Testing is easier because each function is a potential subject for a
unit test. Functions don't depend on system state that needs to be
@@ -162,7 +176,7 @@
others will be useful in a wide variety of programs. For example, a
function that takes a directory path and returns all the XML files in
the directory, or a function that takes a filename and returns its
-contents would be useful in many different situations.
+contents, can be applied to many different situations.
Over time you'll form a personal library of utilities. Often you'll
assemble new programs by arranging existing functions in a new
@@ -171,18 +185,17 @@
-
Iterators
-----------------------
-Iterators are an important foundation for writing functional-style
-programs.
+I'll start by looking at a Python language feature that's an important
+foundation for writing functional-style programs: iterators.
-An iterator is an object representing a stream of data that returns
-the data one element at a time. A Python iterator must support a
-method called ``next()`` that takes no arguments and always returns
-the next element of the stream. If there are no more elements in the
-stream, ``next()`` must raise the ``StopIteration`` exception.
+An iterator is an object representing a stream of data; this object
+returns the data one element at a time. A Python iterator must
+support a method called ``next()`` that takes no arguments and always
+returns the next element of the stream. If there are no more elements
+in the stream, ``next()`` must raise the ``StopIteration`` exception.
Iterators don't have to be finite, though; it's perfectly reasonable
to write an iterator that produces an infinite stream of data.
@@ -190,7 +203,8 @@
to return an iterator that will return the object's contents or
elements, raising ``TypeError`` if the object doesn't support
iteration. Several of Python's built-in data types support iteration,
-the most common being lists and dictionaries.
+the most common being lists and dictionaries. An object is called
+an **iterable** object if you can get an iterator for it.
You can experiment with the iteration interface manually::
@@ -221,8 +235,8 @@
for i in obj:
print i
-Iterators can be materialized as lists or tuples by using the ``list()`` or ``tuple()``
-constructor functions::
+Iterators can be materialized as lists or tuples by using the
+``list()`` or ``tuple()`` constructor functions::
>>> L = [1,2,3]
>>> iterator = iter(L)
@@ -285,10 +299,14 @@
Dec 12
Oct 10
-This is just the default behaviour. If you want to iterate over keys,
-values, or key/value pairs, you can explicitly call the
-``iterkeys()``, ``itervalues()``, or ``iteritems()``
-methods to get an appropriate iterator.
+Note that the order is essentially random, because it's based on the
+hash ordering of the objects in the dictionary.
+
+Applying ``iter()`` to a dictionary always loops over the keys, but
+dictionaries have methods that return other iterators. If you want to
+iterate over keys, values, or key/value pairs, you can explicitly call
+the ``iterkeys()``, ``itervalues()``, or ``iteritems()`` methods to
+get an appropriate iterator.
The ``dict()`` constructor can accept an iterator that returns a
finite stream of ``(key, value)`` tuples::
@@ -317,10 +335,15 @@
List comprehensions
-----------------------
+.. comment
+
+ Maybe gencomps should be described and emphasized, and listcomps
+ be mentioned as an afterthought.
+
Two common operations on a stream are performing some operation for
every element, or selecting a subset of elements that meet some
condition. For example, given a list of strings, you might want to
-pull out all the strings containing a given substring, or strip off
+pull out all the strings containing a given substring or strip off
trailing whitespace from each line.
List comprehensions (or "listcomps") are a concise notation for such
@@ -340,9 +363,8 @@
containing the resulting lines, not an iterator. This means that list
comprehensions aren't very useful if you're working with iterators
that return an infinite or a very large stream of data. Later we'll
-discuss generator expressions, a feature that provides similar
-capabilities as list comprehensions but can be used with infinite
-iterators.
+discuss generator expressions, which have the capabilities of list
+comprehensions but can be used with infinite iterators.
List comprehensions have the form::
@@ -403,8 +425,8 @@
-----------------------
Generator expressions are written like list comprehensions, but are
-surrounded by parentheses (\samp{()}) and not square brackets
-(\samp{[]}). The result of a generator expression
+surrounded by parentheses ("()") and not square brackets
+("[]"). The result of a generator expression
is an iterator that returns the computed elements without
materializing a list containing them all.
@@ -520,7 +542,7 @@
-New generator features in Python 2.5
+Passing values into a generator
''''''''''''''''''''''''''''''''''''''''''''''
In Python 2.4 and earlier, generators only produced output. Once a
@@ -625,8 +647,7 @@
Built-in functions
----------------------------------------------
-Let's look in more detail at those built-in functions that are
-relevant to iterators.
+Let's look in more detail at built-in functions often used with iterators.
Two Python's built-in functions, ``map()`` and ``filter()``, are
somewhat obsolete; they duplicate the features of list comprehensions
@@ -701,7 +722,7 @@
If you use ``operator.add`` with ``reduce()``, you'll add up all the
elements of the iterable. This case is so common that there's a special
-built-in for it::
+built-in called ``sum()`` to compute it::
reduce(operator.add, [1,2,3,4], 0) =>
10
@@ -710,8 +731,20 @@
sum([]) =>
0
-``enumerate(iter)`` counts off the elements in the iterable, return
-pairs of the count and each element.
+For many uses of ``reduce()``, though, it can be clearer to just write
+the obvious ``for`` loop::
+
+ # Instead of:
+ product = reduce(operator.mul, [1,2,3], 1)
+
+ # You can write:
+ product = 1
+ for i in [1,2,3]:
+ product *= i
+
+
+``enumerate(iter)`` counts off the elements in the iterable, returning
+2-tuples containing the count and each element.
::
@@ -724,7 +757,7 @@
f = open('data.txt', 'r')
for i, line in enumerate(f):
if line.strip() == '':
- print 'Blank line at', line
+ print 'Blank line at line #%i' % i
``sorted(iterable, [cmp=None], [key=None], [reverse=False)``
collects all the elements of the iterable into a list, sorts
@@ -745,12 +778,12 @@
[9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
(For a more detailed discussion of sorting, see the Sorting mini-HOWTO
-in the Python wiki at \url{http://wiki.python.org/moin/SortingHowto}.)
+in the Python wiki at http://wiki.python.org/moin/SortingHowto.)
The ``any(iter)`` and ``all(iter)`` built-ins look at
the truth values of an iterable's contents. ``any()`` returns
True if any element in the iterable is a true value, and ``all()``
-retturn True if all of the elements are true values::
+returns True if all of the elements are true values::
any([0,1,0]) =>
True
@@ -770,16 +803,13 @@
----------------------------------------------
When writing functional-style programs, you'll often need little
-functions that act as predicates or that combine list elements in some
-particular way.
+functions that act as predicates or that combine elements in some way.
-If there's a Python built-in that's suitable, or a module has a
-function that does what you need, you don't need to define a new
-function at all::
+If there's a Python built-in or a module function that's suitable, you
+don't need to define a new function at all::
stripped_lines = [line.strip for line in lines]
existing_files = filter(os.path.exists, file_list)
- XXX pi-notation example
If the function you need doesn't exist, you need to write it. One way
to write small functions is to use the ``lambda`` statement. ``lambda``
@@ -804,7 +834,8 @@
def adder(x,y):
return x + y
-Which alternative is preferable? That's a style question.
+Which alternative is preferable? That's a style question; my general
+view is to avoid it.
``lambda`` is quite limited in the functions it can define. The
result has to be computable as a single expression, which means you
@@ -815,7 +846,22 @@
::
- freq = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
+ freq = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
+
+You can figure it out, but it takes time to disentangle the function
+to figure out what's going on. Using a short nested
+``def`` statements makes things a little bit better::
+
+ def combine (a, b):
+ return 0, a[1] + b[1]
+
+ return reduce(combine_freq, items)[1]
+
+It would be best of all if I had simply used a ``for`` loop::
+
+ total = 0
+ for a, b in items:
+ total += b
Fredrik Lundh once suggested the following set of rules for refactoring
uses of ``lambda``:
@@ -827,28 +873,16 @@
4) Convert the lambda to a def statement, using that name.
5) Remove the comment.
-Personally I try to avoid lambdas, favoring short nested
-``def`` statements like this::
-
- def output_total_freq (items):
- """Takes a list of (element, frequency count) tuples
- and returns the total number of occurrences.
- """
-
- def combine (a, b):
- return 0, a[1] + b[1]
-
- return reduce(combine_freq, items)[1]
-
-You might disagree that this style is better.
+I really like these rules, but you're free todisagree that this style
+is better.
The itertools module
-----------------------
The ``itertools`` module contains a number of commonly-used iterators
-as well as functions for combining iterators. This section will
-introduce the module's contents using small examples.
+as well as functions for combining several iterators. This section
+will introduce the module's contents by showing small examples.
``itertools.count(n)`` returns an infinite stream of
integers, increasing by 1 each time. You can optionally supply the
@@ -861,7 +895,7 @@
``itertools.cycle(iter)`` saves a copy of the contents of a provided
iterable and returns a new iterator that returns its elements from
-first to last, and then repeats these elements infinitely.
+first to last. The new iterator will repeat these elements infinitely.
::
@@ -878,10 +912,10 @@
itertools.repeat('abc', 5) =>
abc, abc, abc, abc, abc
-``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of
-iterables as input, and returns all the elements of the first iterator,
-then all the elements of the second, until all of the iterables
-have been exhausted.
+``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of
+iterables as input, and returns all the elements of the first
+iterator, then all the elements of the second, and so on, until all of
+the iterables have been exhausted.
::
@@ -895,25 +929,25 @@
('a', 1), ('b', 2), ('c', 3)
This iterator is intended to be used with iterables that are all of
-the same length. If the iterables are of different lengths,
-the resulting stream will be the same length as the shortest iterable.
-Unfortunately, if you passed in any iterators as arguments,
-an element may have been taken from the longer iterators
-and discarded, meaning that you can't keep using them; if you do,
-you risk skipping a discarded element.
+the same length. If the iterables are of different lengths, the
+resulting stream will be the same length as the shortest iterable.
::
itertools.izip(['a', 'b'], (1, 2, 3)) =>
('a', 1), ('b', 2)
+You should avoid doing this, though, because an element may be taken
+from the longer iterators and discarded. This means you can't go on
+to use the iterators further because you risk skipping a discarded
+element.
``itertools.islice(iter, [start], stop, [step])`` returns a stream
that's a slice of the iterator. It can return the first ``stop``
elements. If you supply a starting index, you'll get ``stop-start``
elements, and if you supply a value for ``step` elements will be
-skipped. Unlike Python's string and list slicing, you can't use
-negative values for ``start``, ``stop``, or ``step``.
+skipped accordingly. Unlike Python's string and list slicing, you
+can't use negative values for ``start``, ``stop``, or ``step``.
::
@@ -943,7 +977,7 @@
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
-Two functions are for calling other functions on the contents of an
+Two functions are used for calling other functions on the contents of an
iterable.
``itertools.imap(f, iterA, iterB, ...)`` returns
@@ -953,13 +987,13 @@
itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
6, 8, 8
-(The ``operator`` module contains a set of functions
+The ``operator`` module contains a set of functions
corresponding to Python's operators. Some examples are
``operator.add(a, b)`` (adds two values),
``operator.ne(a, b)`` (same as ``a!=b``),
and
``operator.attrgetter('id')`` (returns a callable that
-fetches the ``"id"`` attribute),
+fetches the ``"id"`` attribute).
``itertools.starmap(func, iter)`` assumes that the iterable will
return a stream of tuples, and calls ``f()`` using these tuples as the
@@ -1017,10 +1051,11 @@
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
-The last function, ``itertools.groupby(iter, key_func=None)``, is the
-most complicated. ``key_func(elem)`` is a function that can compute a
-key value for each element returned by the iterable. If you don't
-supply a key function, the key is simply each element itself.
+The last function I'll discuss, ``itertools.groupby(iter,
+key_func=None)``, is the most complicated. ``key_func(elem)`` is a
+function that can compute a key value for each element returned by the
+iterable. If you don't supply a key function, the key is simply each
+element itself.
``groupby()`` collects all the consecutive elements from the
underlying iterable that have the same key value, and returns a stream
@@ -1068,7 +1103,7 @@
For programs written in a functional style, you'll sometimes want to
construct variants of existing functions that have some of the
parameters filled in. Consider a Python function ``f(a, b, c)``; you
-could create a new function ``g(b, c)`` that was equivalent to
+may wish to create a new function ``g(b, c)`` that was equivalent to
``f(1, b, c)``. This is called "partial function application".
The constructor for ``partial`` takes the arguments ``(function, arg1,
@@ -1089,15 +1124,6 @@
server_log('Unable to open socket')
-Topics to place
------------------------------
-
-XXX os.walk()
-
-XXX Need a large example.
-
-=======
-
Acknowledgements
------------------------------
@@ -1108,6 +1134,19 @@
.. comment
+ Topics to place
+ -----------------------------
+
+ XXX os.walk()
+
+ XXX Need a large example.
+
+ But will an example add much? I'll post a first draft and see
+ what the comments say.
+
+.. comment
+
+ Original outline:
Introduction
Idea of FP
Programs built out of functions
@@ -1141,6 +1180,9 @@
.. comment
+ Handy little function for printing part of an iterator -- used
+ while writing this document.
+
import itertools
def print_iter(it):
slice = itertools.islice(it, 10)
@@ -1153,9 +1195,30 @@
References
--------------------
-SICP
+General
+'''''''''''''''
+
+**Structure and Interpretation of Computer Programs**, by
+Harold Abelson and Gerald Jay Sussman with Julie Sussman.
+
+Full text at http://mitpress.mit.edu/sicp/.
+
+A classic textbook of computer science. Chapters 2 and 3 discuss the
+use of sequences and streams to organize the data flow inside a
+program. The book uses Scheme for its examples, but many of the
+design approaches described in these chapters are applicable to
+functional-style Python code.
+
+http://en.wikipedia.org/wiki/Functional_programming:
+General Wikipedia entry describing functional programming.
+
+
+Python documentation
+'''''''''''''''''''''''''''
-Relevant manual sections
+http://docs.python.org/lib/module-itertools.html:
+Documentation ``for the itertools`` module.
-XXX
+http://docs.python.org/lib/module-operator.html:
+Documentation ``for the operator`` module.
More information about the Python-checkins
mailing list