[Python-checkins] r47183 - sandbox/trunk/Doc/functional.rst

andrew.kuchling python-checkins at python.org
Fri Jun 30 23:03:12 CEST 2006


Author: andrew.kuchling
Date: Fri Jun 30 23:03:11 2006
New Revision: 47183

Modified:
   sandbox/trunk/Doc/functional.rst
Log:
Revision pass: lots of small edits; request feedback; remove LaTeX markup :) ; add small references section

Modified: sandbox/trunk/Doc/functional.rst
==============================================================================
--- sandbox/trunk/Doc/functional.rst	(original)
+++ sandbox/trunk/Doc/functional.rst	Fri Jun 30 23:03:11 2006
@@ -1,13 +1,14 @@
 Functional Programming HOWTO
 ================================
 
-**Incomplete Draft**
+(This is a first draft.  Please send comments/error
+reports/suggestions to amk at amk.ca.)
 
-In this document, we'll take a tour of Python's features that are
-well-suited to implementing programs in a functional style.  After an
-introduction to the concepts of functional programming, we'll look at
-language features such as iterators and generators and relevant
-library modules such as the ``itertools`` and ``functools`` modules.
+In this document, we'll take a tour of Python's features suitable for
+implementing programs in a functional style.  After an introduction to
+the concepts of functional programming, we'll look at language
+features such as iterators and generators and relevant library modules
+such as ``itertools`` and ``functools``.
 
 
 Introduction
@@ -20,10 +21,11 @@
 Programming languages support decomposing problems in several different 
 ways:
 
-* Most programming languages are **procedural**, meaning
-  that programs are lists of instructions that tell the computer what to
-  do with the program's input.  
-  C, Pascal, and even Unix shells are procedural languages.  
+* Most programming languages are **procedural**: 
+  programs are lists of instructions that tell the computer what to
+  do with the program's input.
+  C, Pascal, and even Unix shells are procedural languages.
+
 * In **declarative** languages, you write a specification that describes 
   the problem to be solved, and the language implementation figures out 
   how to perform the computation efficiently.  SQL is the declarative 
@@ -31,43 +33,54 @@
   the data set you want to retrieve, and the SQL engine decides whether to 
   scan tables or use indexes, which subclauses should be performed first,
   etc.
-* **Object-oriented** programming is a style in which programs consist 
-  of collections of objects.  Objects have internal state and support various
-  methods that query or modify this internal state in some way. Smalltalk
-  and Java are object-oriented languages.  C++ and Python are languages
-  that support object-oriented programming, but don't require it.
+
+* **Object-oriented** programs manipulate  collections of objects.
+  Objects have internal state and support methods that query or modify
+  this internal state in some way. Smalltalk and Java are
+  object-oriented languages.  C++ and Python are languages that
+  support object-oriented programming, but don't force the use 
+  of object-oriented features.
+
 * **Functional** programming decomposes a problem into a set of functions.
   Ideally, functions only take inputs and produce outputs, and don't have any 
   internal state that affects the output produced for a given input.
   Well-known functional languages include the ML family (Standard ML,
-  OCaml, and other variants) and Haskell. 
+  OCaml, and other variants) and Haskell.
 
-The designers of some computer languages have chosen one style of
-programming that they support best.  This often makes it difficult to
-write programs in the language that use a different style.  Other
-language designers try to produce multi-paradigm languages that
-support several different styles.  Lisp, C++, and Python are
-multi-paradigm; you can write programs or libraries that are largely
-procedural, object-oriented, or functional.  In a large program,
-different sections might be written in different styles; the GUI might
-be object-oriented while the processing logic is procedural or
-functional, for example.
+The designers of some computer languages have chosen one approach to 
+programming that's emphasized.  This often makes it difficult to
+write programs that use a different approach.  Other languages are
+multi-paradigm languages that support several different approaches.  Lisp,
+C++, and Python are multi-paradigm; you can write programs or
+libraries that are largely procedural, object-oriented, or functional
+in all of these languages.  In a large program, different sections
+might be written using different approaches; the GUI might be object-oriented
+while the processing logic is procedural or functional, for example.
 
 In a functional program, input flows through a set of functions. Each
 function operates on its input and produces some output.  Functional
-style avoids making functions have side effects that modify internal
+style frowns upon functions with side effects that modify internal
 state or make other changes that aren't visible in the function's
-return value.  Avoiding side effects means not using data structures
-that get updated as a program runs.  Some languages are very strict
-about this and don't even have assignment statements such as ``a=3``
-or ``c = a + b``, but it's difficult to avoid all side effects; some
-Python code, such as a ``print`` statement or a ``time.sleep(1)``,
-returns nothing and is only called for its side effects.  
+return value.  Functions that have no side effects at all are 
+called **purely functional**.
+Avoiding side effects means not using data structures
+that get updated as a program runs; every function's output 
+must only depend on its input.
+
+Some languages are very strict about purity and don't even have
+assignment statements such as ``a=3`` or ``c = a + b``, but it's
+difficult to avoid all side effects.  Printing to the screen or
+writing to a disk file are side effects, for example.  For example, in
+Python a ``print`` statement or a ``time.sleep(1)`` both return no
+useful value; they're only called for their side effects of sending
+some text to the screen or pausing execution for a second.
 
 Python programs written in functional style usually won't go to the
-extreme of avoiding all I/O or all assignments.  The implementation of
-a function will still use assignments to local variables, but won't
-touch any global variables or have other side effects.
+extreme of avoiding all I/O or all assignments; instead, they'll
+provide a functional-appearing interface but will use non-functional
+features internally.  For example, the implementation of a function
+will still use assignments to local variables, but won't modify global
+variables or have other side effects.
 
 Functional programming can be considered the opposite of
 object-oriented programming.  Objects are little capsules containing
@@ -122,29 +135,30 @@
 you use daily (the Python interpreter, your XML parser, your web
 browser) could be proven correct.  Even if you wrote down or generated
 a proof, there would then be the question of verifying the proof;
-maybe you only ***think*** you've proved that the program correct.
+maybe there's an error in it, and you only ***think*** you've proved
+that the program correct.
 
 Modularity
 ''''''''''''''''''''''
 
 A more practical benefit of functional programming is that it forces
-you to break apart your problem into small pieces.  It's easier to
-specify and write a small function that does one thing than a large
-function that performs a complicated transformation.  Small functions
-are easier to read and to check for errors.  A program that's
-structured into many small pieces is a **modular** program.
+you to break apart your problem into small pieces.  Programs are more
+modular as a result.  It's easier to specify and write a small
+function that does one thing than a large function that performs a
+complicated transformation.  Small functions are also easier to read
+and to check for errors.
 
 
 Ease of debugging and testing 
 ''''''''''''''''''''''''''''''''''
 
-It's also easier to test and to debug a functional-style program.
+Testing and debugging a functional-style program is easier.
 
-It's easier to debug because functions are generally small and clearly
-specified.  When a program doesn't work, each function is an interface
-point where you can check that the data is correct.  You can look at
-the intermediate inputs and outputs to quickly isolate the function
-that's responsible for a bug.
+Debugging is simplified because functions are generally small and
+clearly specified.  When a program doesn't work, each function is an
+interface point where you can check that the data are correct.  You
+can look at the intermediate inputs and outputs to quickly isolate the
+function that's responsible for a bug.
 
 Testing is easier because each function is a potential subject for a
 unit test.  Functions don't depend on system state that needs to be
@@ -162,7 +176,7 @@
 others will be useful in a wide variety of programs.  For example, a
 function that takes a directory path and returns all the XML files in
 the directory, or a function that takes a filename and returns its
-contents would be useful in many different situations.
+contents, can be applied to many different situations.
 
 Over time you'll form a personal library of utilities.  Often you'll
 assemble new programs by arranging existing functions in a new
@@ -171,18 +185,17 @@
 
 
 
-
 Iterators
 -----------------------
 
-Iterators are an important foundation for writing functional-style
-programs.
+I'll start by looking at a Python language feature that's an important
+foundation for writing functional-style programs: iterators.
 
-An iterator is an object representing a stream of data that returns
-the data one element at a time.  A Python iterator must support a
-method called ``next()`` that takes no arguments and always returns
-the next element of the stream.  If there are no more elements in the
-stream, ``next()`` must raise the ``StopIteration`` exception.
+An iterator is an object representing a stream of data; this object
+returns the data one element at a time.  A Python iterator must
+support a method called ``next()`` that takes no arguments and always
+returns the next element of the stream.  If there are no more elements
+in the stream, ``next()`` must raise the ``StopIteration`` exception.
 Iterators don't have to be finite, though; it's perfectly reasonable
 to write an iterator that produces an infinite stream of data.
 
@@ -190,7 +203,8 @@
 to return an iterator that will return the object's contents or
 elements, raising ``TypeError`` if the object doesn't support
 iteration.  Several of Python's built-in data types support iteration,
-the most common being lists and dictionaries.
+the most common being lists and dictionaries.  An object is called 
+an **iterable** object if you can get an iterator for it.
 
 You can experiment with the iteration interface manually::
 
@@ -221,8 +235,8 @@
         for i in obj:
             print i
 
-Iterators can be materialized as lists or tuples by using the ``list()`` or ``tuple()``
-constructor functions::
+Iterators can be materialized as lists or tuples by using the
+``list()`` or ``tuple()`` constructor functions::
 
     >>> L = [1,2,3]
     >>> iterator = iter(L)
@@ -285,10 +299,14 @@
     Dec 12
     Oct 10
 
-This is just the default behaviour.  If you want to iterate over keys,
-values, or key/value pairs, you can explicitly call the
-``iterkeys()``, ``itervalues()``, or ``iteritems()``
-methods to get an appropriate iterator.  
+Note that the order is essentially random, because it's based on the
+hash ordering of the objects in the dictionary.
+
+Applying ``iter()`` to a dictionary always loops over the keys, but
+dictionaries have methods that return other iterators.  If you want to
+iterate over keys, values, or key/value pairs, you can explicitly call
+the ``iterkeys()``, ``itervalues()``, or ``iteritems()`` methods to
+get an appropriate iterator.
 
 The ``dict()`` constructor can accept an iterator that returns a
 finite stream of ``(key, value)`` tuples::
@@ -317,10 +335,15 @@
 List comprehensions
 -----------------------
 
+.. comment
+
+   Maybe gencomps should be described and emphasized, and listcomps
+   be mentioned as an afterthought.
+
 Two common operations on a stream are performing some operation for
 every element, or selecting a subset of elements that meet some
 condition.  For example, given a list of strings, you might want to
-pull out all the strings containing a given substring, or strip off
+pull out all the strings containing a given substring or strip off
 trailing whitespace from each line.
 
 List comprehensions (or "listcomps") are a concise notation for such
@@ -340,9 +363,8 @@
 containing the resulting lines, not an iterator.  This means that list
 comprehensions aren't very useful if you're working with iterators
 that return an infinite or a very large stream of data.  Later we'll
-discuss generator expressions, a feature that provides similar
-capabilities as list comprehensions but can be used with infinite
-iterators.
+discuss generator expressions, which have the capabilities of list
+comprehensions but can be used with infinite iterators.
 
 List comprehensions have the form::
 
@@ -403,8 +425,8 @@
 -----------------------
 
 Generator expressions are written like list comprehensions, but are
-surrounded by parentheses (\samp{()}) and not square brackets
-(\samp{[]}).  The result of a generator expression
+surrounded by parentheses ("()") and not square brackets
+("[]").  The result of a generator expression
 is an iterator that returns the computed elements without 
 materializing a list containing them all.
 
@@ -520,7 +542,7 @@
 
 
 
-New generator features in Python 2.5
+Passing values into a generator
 ''''''''''''''''''''''''''''''''''''''''''''''
 
 In Python 2.4 and earlier, generators only produced output.  Once a
@@ -625,8 +647,7 @@
 Built-in functions
 ----------------------------------------------
 
-Let's look in more detail at those built-in functions that are
-relevant to iterators.
+Let's look in more detail at built-in functions often used with iterators.
 
 Two Python's built-in functions, ``map()`` and ``filter()``, are
 somewhat obsolete; they duplicate the features of list comprehensions
@@ -701,7 +722,7 @@
 
 If you use ``operator.add`` with ``reduce()``, you'll add up all the 
 elements of the iterable.  This case is so common that there's a special
-built-in for it::
+built-in called ``sum()`` to compute it::
 
     reduce(operator.add, [1,2,3,4], 0) =>
       10
@@ -710,8 +731,20 @@
     sum([]) =>
       0
 
-``enumerate(iter)`` counts off the elements in the iterable, return
-pairs of the count and each element.
+For many uses of ``reduce()``, though, it can be clearer to just write
+the obvious ``for`` loop::
+
+    # Instead of:
+    product = reduce(operator.mul, [1,2,3], 1)
+
+    # You can write:
+    product = 1
+    for i in [1,2,3]:
+        product *= i
+
+
+``enumerate(iter)`` counts off the elements in the iterable, returning
+2-tuples containing the count and each element.
 
 ::
 
@@ -724,7 +757,7 @@
     f = open('data.txt', 'r')
     for i, line in enumerate(f):
         if line.strip() == '':
-            print 'Blank line at', line
+            print 'Blank line at line #%i' % i
 
 ``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` 
 collects all the elements of the iterable into a list, sorts 
@@ -745,12 +778,12 @@
       [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
 
 (For a more detailed discussion of sorting, see the Sorting mini-HOWTO
-in the Python wiki at \url{http://wiki.python.org/moin/SortingHowto}.)
+in the Python wiki at http://wiki.python.org/moin/SortingHowto.)
 
 The ``any(iter)`` and ``all(iter)`` built-ins look at 
 the truth values of an iterable's contents.  ``any()`` returns 
 True if any element in the iterable is a true value, and ``all()`` 
-retturn True if all of the elements are true values::
+returns True if all of the elements are true values::
 
     any([0,1,0]) =>
       True
@@ -770,16 +803,13 @@
 ----------------------------------------------
 
 When writing functional-style programs, you'll often need little
-functions that act as predicates or that combine list elements in some
-particular way.   
+functions that act as predicates or that combine elements in some way.
 
-If there's a Python built-in that's suitable, or a module has a
-function that does what you need, you don't need to define a new
-function at all::
+If there's a Python built-in or a module function that's suitable, you
+don't need to define a new function at all::
 
         stripped_lines = [line.strip for line in lines]
         existing_files = filter(os.path.exists, file_list)
-        XXX pi-notation example
 
 If the function you need doesn't exist, you need to write it.  One way
 to write small functions is to use the ``lambda`` statement.  ``lambda``
@@ -804,7 +834,8 @@
         def adder(x,y):
             return x + y
 
-Which alternative is preferable?  That's a style question.
+Which alternative is preferable?  That's a style question; my general 
+view is to avoid it.
 
 ``lambda`` is quite limited in the functions it can define.  The
 result has to be computable as a single expression, which means you
@@ -815,7 +846,22 @@
 
 ::
 
-     freq = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
+    freq = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
+
+You can figure it out, but it takes time to disentangle the function
+to figure out what's going on.  Using a short nested
+``def`` statements makes things a little bit better::
+
+    def combine (a, b):
+	return 0, a[1] + b[1]
+
+    return reduce(combine_freq, items)[1]
+
+It would be best of all if I had simply used a ``for`` loop::
+
+     total = 0
+     for a, b in items:
+         total += b
 
 Fredrik Lundh once suggested the following set of rules for refactoring 
 uses of ``lambda``:
@@ -827,28 +873,16 @@
 4) Convert the lambda to a def statement, using that name.
 5) Remove the comment.
 
-Personally I try to avoid lambdas, favoring short nested
-``def`` statements like this::
-
-    def output_total_freq (items):
-	"""Takes a list of (element, frequency count) tuples
-	and returns the total number of occurrences.
-        """
-
-	def combine (a, b):
-	    return 0, a[1] + b[1]
-
-	return reduce(combine_freq, items)[1]
-
-You might disagree that this style is better.
+I really like these rules, but you're free todisagree that this style
+is better.
 
 
 The itertools module
 -----------------------
 
 The ``itertools`` module contains a number of commonly-used iterators
-as well as functions for combining iterators.  This section will
-introduce the module's contents using small examples.
+as well as functions for combining several iterators.  This section
+will introduce the module's contents by showing small examples.
 
 ``itertools.count(n)`` returns an infinite stream of
 integers, increasing by 1 each time.  You can optionally supply the
@@ -861,7 +895,7 @@
 
 ``itertools.cycle(iter)`` saves a copy of the contents of a provided
 iterable and returns a new iterator that returns its elements from
-first to last, and then repeats these elements infinitely.
+first to last.  The new iterator will repeat these elements infinitely.
 
 ::
 
@@ -878,10 +912,10 @@
     itertools.repeat('abc', 5) =>
       abc, abc, abc, abc, abc
 
-``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of 
-iterables as input, and returns all the elements of the first iterator, 
-then all the elements of the second, until all of the iterables 
-have been exhausted.
+``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of
+iterables as input, and returns all the elements of the first
+iterator, then all the elements of the second, and so on, until all of
+the iterables have been exhausted.
 
 ::
 
@@ -895,25 +929,25 @@
       ('a', 1), ('b', 2), ('c', 3)
 
 This iterator is intended to be used with iterables that are all of
-the same length.    If the iterables are of different lengths, 
-the resulting stream will be the same length as the shortest iterable.
-Unfortunately, if you passed in any iterators as arguments, 
-an element may have been taken from the longer iterators 
-and discarded, meaning that you can't keep using them; if you do, 
-you risk skipping a discarded element.
+the same length.  If the iterables are of different lengths, the
+resulting stream will be the same length as the shortest iterable.
 
 ::
 
     itertools.izip(['a', 'b'], (1, 2, 3)) =>
       ('a', 1), ('b', 2)
 
+You should avoid doing this, though, because an element may be taken
+from the longer iterators and discarded.  This means you can't go on
+to use the iterators further because you risk skipping a discarded
+element.
 
 ``itertools.islice(iter, [start], stop, [step])`` returns a stream
 that's a slice of the iterator.  It can return the first ``stop``
 elements.  If you supply a starting index, you'll get ``stop-start``
 elements, and if you supply a value for ``step` elements will be
-skipped.  Unlike Python's string and list slicing, you can't use
-negative values for ``start``, ``stop``, or ``step``.
+skipped accordingly.  Unlike Python's string and list slicing, you
+can't use negative values for ``start``, ``stop``, or ``step``.
 
 ::
 
@@ -943,7 +977,7 @@
            0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
 
 
-Two functions are for calling other functions on the contents of an
+Two functions are used for calling other functions on the contents of an
 iterable.
 
 ``itertools.imap(f, iterA, iterB, ...)`` returns 
@@ -953,13 +987,13 @@
     itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
       6, 8, 8
 
-(The ``operator`` module contains a set of functions 
+The ``operator`` module contains a set of functions 
 corresponding to Python's operators.  Some examples are 
 ``operator.add(a, b)`` (adds two values), 
 ``operator.ne(a, b)`` (same as ``a!=b``),
 and 
 ``operator.attrgetter('id')`` (returns a callable that
-fetches the ``"id"`` attribute), 
+fetches the ``"id"`` attribute).
 
 ``itertools.starmap(func, iter)`` assumes that the iterable will 
 return a stream of tuples, and calls ``f()`` using these tuples as the 
@@ -1017,10 +1051,11 @@
       1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
 
 
-The last function, ``itertools.groupby(iter, key_func=None)``, is the
-most complicated.  ``key_func(elem)`` is a function that can compute a
-key value for each element returned by the iterable.  If you don't
-supply a key function, the key is simply each element itself.
+The last function I'll discuss, ``itertools.groupby(iter,
+key_func=None)``, is the most complicated.  ``key_func(elem)`` is a
+function that can compute a key value for each element returned by the
+iterable.  If you don't supply a key function, the key is simply each
+element itself.
 
 ``groupby()`` collects all the consecutive elements from the
 underlying iterable that have the same key value, and returns a stream
@@ -1068,7 +1103,7 @@
 For programs written in a functional style, you'll sometimes want to
 construct variants of existing functions that have some of the
 parameters filled in.  Consider a Python function ``f(a, b, c)``; you
-could create a new function ``g(b, c)`` that was equivalent to 
+may wish to create a new function ``g(b, c)`` that was equivalent to
 ``f(1, b, c)``.  This is called "partial function application".
 
 The constructor for ``partial`` takes the arguments ``(function, arg1,
@@ -1089,15 +1124,6 @@
     server_log('Unable to open socket')
 
 
-Topics to place
------------------------------
-
-XXX os.walk()
-
-XXX Need a large example.
-
-=======
-
 Acknowledgements
 ------------------------------
 
@@ -1108,6 +1134,19 @@
 
 .. comment
 
+    Topics to place
+    -----------------------------
+
+    XXX os.walk()
+
+    XXX Need a large example.
+
+    But will an example add much?  I'll post a first draft and see
+    what the comments say.
+
+.. comment
+
+    Original outline:
     Introduction
             Idea of FP
                     Programs built out of functions
@@ -1141,6 +1180,9 @@
 
 .. comment
 
+    Handy little function for printing part of an iterator -- used
+    while writing this document.
+
     import itertools
     def print_iter(it):
          slice = itertools.islice(it, 10)
@@ -1153,9 +1195,30 @@
 References
 --------------------
 
-SICP
+General
+'''''''''''''''
+
+**Structure and Interpretation of Computer Programs**, by 
+Harold Abelson and Gerald Jay Sussman with Julie Sussman.
+
+Full text at http://mitpress.mit.edu/sicp/.
+
+A classic textbook of computer science.  Chapters 2 and 3 discuss the
+use of sequences and streams to organize the data flow inside a
+program.  The book uses Scheme for its examples, but many of the
+design approaches described in these chapters are applicable to
+functional-style Python code.
+
+http://en.wikipedia.org/wiki/Functional_programming:
+General Wikipedia entry describing functional programming.
+
+
+Python documentation
+'''''''''''''''''''''''''''
 
-Relevant manual sections
+http://docs.python.org/lib/module-itertools.html:
+Documentation ``for the itertools`` module.
 
-XXX
+http://docs.python.org/lib/module-operator.html:
+Documentation ``for the operator`` module.
 


More information about the Python-checkins mailing list