r47183 - sandbox/trunk/Doc/functional.rst
Author: andrew.kuchling Date: Fri Jun 30 23:03:11 2006 New Revision: 47183 Modified: sandbox/trunk/Doc/functional.rst Log: Revision pass: lots of small edits; request feedback; remove LaTeX markup :) ; add small references section Modified: sandbox/trunk/Doc/functional.rst ============================================================================== --- sandbox/trunk/Doc/functional.rst (original) +++ sandbox/trunk/Doc/functional.rst Fri Jun 30 23:03:11 2006 @@ -1,13 +1,14 @@ Functional Programming HOWTO ================================ -**Incomplete Draft** +(This is a first draft. Please send comments/error +reports/suggestions to amk@amk.ca.) -In this document, we'll take a tour of Python's features that are -well-suited to implementing programs in a functional style. After an -introduction to the concepts of functional programming, we'll look at -language features such as iterators and generators and relevant -library modules such as the ``itertools`` and ``functools`` modules. +In this document, we'll take a tour of Python's features suitable for +implementing programs in a functional style. After an introduction to +the concepts of functional programming, we'll look at language +features such as iterators and generators and relevant library modules +such as ``itertools`` and ``functools``. Introduction @@ -20,10 +21,11 @@ Programming languages support decomposing problems in several different ways: -* Most programming languages are **procedural**, meaning - that programs are lists of instructions that tell the computer what to - do with the program's input. - C, Pascal, and even Unix shells are procedural languages. +* Most programming languages are **procedural**: + programs are lists of instructions that tell the computer what to + do with the program's input. + C, Pascal, and even Unix shells are procedural languages. + * In **declarative** languages, you write a specification that describes the problem to be solved, and the language implementation figures out how to perform the computation efficiently. SQL is the declarative @@ -31,43 +33,54 @@ the data set you want to retrieve, and the SQL engine decides whether to scan tables or use indexes, which subclauses should be performed first, etc. -* **Object-oriented** programming is a style in which programs consist - of collections of objects. Objects have internal state and support various - methods that query or modify this internal state in some way. Smalltalk - and Java are object-oriented languages. C++ and Python are languages - that support object-oriented programming, but don't require it. + +* **Object-oriented** programs manipulate collections of objects. + Objects have internal state and support methods that query or modify + this internal state in some way. Smalltalk and Java are + object-oriented languages. C++ and Python are languages that + support object-oriented programming, but don't force the use + of object-oriented features. + * **Functional** programming decomposes a problem into a set of functions. Ideally, functions only take inputs and produce outputs, and don't have any internal state that affects the output produced for a given input. Well-known functional languages include the ML family (Standard ML, - OCaml, and other variants) and Haskell. + OCaml, and other variants) and Haskell. -The designers of some computer languages have chosen one style of -programming that they support best. This often makes it difficult to -write programs in the language that use a different style. Other -language designers try to produce multi-paradigm languages that -support several different styles. Lisp, C++, and Python are -multi-paradigm; you can write programs or libraries that are largely -procedural, object-oriented, or functional. In a large program, -different sections might be written in different styles; the GUI might -be object-oriented while the processing logic is procedural or -functional, for example. +The designers of some computer languages have chosen one approach to +programming that's emphasized. This often makes it difficult to +write programs that use a different approach. Other languages are +multi-paradigm languages that support several different approaches. Lisp, +C++, and Python are multi-paradigm; you can write programs or +libraries that are largely procedural, object-oriented, or functional +in all of these languages. In a large program, different sections +might be written using different approaches; the GUI might be object-oriented +while the processing logic is procedural or functional, for example. In a functional program, input flows through a set of functions. Each function operates on its input and produces some output. Functional -style avoids making functions have side effects that modify internal +style frowns upon functions with side effects that modify internal state or make other changes that aren't visible in the function's -return value. Avoiding side effects means not using data structures -that get updated as a program runs. Some languages are very strict -about this and don't even have assignment statements such as ``a=3`` -or ``c = a + b``, but it's difficult to avoid all side effects; some -Python code, such as a ``print`` statement or a ``time.sleep(1)``, -returns nothing and is only called for its side effects. +return value. Functions that have no side effects at all are +called **purely functional**. +Avoiding side effects means not using data structures +that get updated as a program runs; every function's output +must only depend on its input. + +Some languages are very strict about purity and don't even have +assignment statements such as ``a=3`` or ``c = a + b``, but it's +difficult to avoid all side effects. Printing to the screen or +writing to a disk file are side effects, for example. For example, in +Python a ``print`` statement or a ``time.sleep(1)`` both return no +useful value; they're only called for their side effects of sending +some text to the screen or pausing execution for a second. Python programs written in functional style usually won't go to the -extreme of avoiding all I/O or all assignments. The implementation of -a function will still use assignments to local variables, but won't -touch any global variables or have other side effects. +extreme of avoiding all I/O or all assignments; instead, they'll +provide a functional-appearing interface but will use non-functional +features internally. For example, the implementation of a function +will still use assignments to local variables, but won't modify global +variables or have other side effects. Functional programming can be considered the opposite of object-oriented programming. Objects are little capsules containing @@ -122,29 +135,30 @@ you use daily (the Python interpreter, your XML parser, your web browser) could be proven correct. Even if you wrote down or generated a proof, there would then be the question of verifying the proof; -maybe you only ***think*** you've proved that the program correct. +maybe there's an error in it, and you only ***think*** you've proved +that the program correct. Modularity '''''''''''''''''''''' A more practical benefit of functional programming is that it forces -you to break apart your problem into small pieces. It's easier to -specify and write a small function that does one thing than a large -function that performs a complicated transformation. Small functions -are easier to read and to check for errors. A program that's -structured into many small pieces is a **modular** program. +you to break apart your problem into small pieces. Programs are more +modular as a result. It's easier to specify and write a small +function that does one thing than a large function that performs a +complicated transformation. Small functions are also easier to read +and to check for errors. Ease of debugging and testing '''''''''''''''''''''''''''''''''' -It's also easier to test and to debug a functional-style program. +Testing and debugging a functional-style program is easier. -It's easier to debug because functions are generally small and clearly -specified. When a program doesn't work, each function is an interface -point where you can check that the data is correct. You can look at -the intermediate inputs and outputs to quickly isolate the function -that's responsible for a bug. +Debugging is simplified because functions are generally small and +clearly specified. When a program doesn't work, each function is an +interface point where you can check that the data are correct. You +can look at the intermediate inputs and outputs to quickly isolate the +function that's responsible for a bug. Testing is easier because each function is a potential subject for a unit test. Functions don't depend on system state that needs to be @@ -162,7 +176,7 @@ others will be useful in a wide variety of programs. For example, a function that takes a directory path and returns all the XML files in the directory, or a function that takes a filename and returns its -contents would be useful in many different situations. +contents, can be applied to many different situations. Over time you'll form a personal library of utilities. Often you'll assemble new programs by arranging existing functions in a new @@ -171,18 +185,17 @@ - Iterators ----------------------- -Iterators are an important foundation for writing functional-style -programs. +I'll start by looking at a Python language feature that's an important +foundation for writing functional-style programs: iterators. -An iterator is an object representing a stream of data that returns -the data one element at a time. A Python iterator must support a -method called ``next()`` that takes no arguments and always returns -the next element of the stream. If there are no more elements in the -stream, ``next()`` must raise the ``StopIteration`` exception. +An iterator is an object representing a stream of data; this object +returns the data one element at a time. A Python iterator must +support a method called ``next()`` that takes no arguments and always +returns the next element of the stream. If there are no more elements +in the stream, ``next()`` must raise the ``StopIteration`` exception. Iterators don't have to be finite, though; it's perfectly reasonable to write an iterator that produces an infinite stream of data. @@ -190,7 +203,8 @@ to return an iterator that will return the object's contents or elements, raising ``TypeError`` if the object doesn't support iteration. Several of Python's built-in data types support iteration, -the most common being lists and dictionaries. +the most common being lists and dictionaries. An object is called +an **iterable** object if you can get an iterator for it. You can experiment with the iteration interface manually:: @@ -221,8 +235,8 @@ for i in obj: print i -Iterators can be materialized as lists or tuples by using the ``list()`` or ``tuple()`` -constructor functions:: +Iterators can be materialized as lists or tuples by using the +``list()`` or ``tuple()`` constructor functions:: >>> L = [1,2,3] >>> iterator = iter(L) @@ -285,10 +299,14 @@ Dec 12 Oct 10 -This is just the default behaviour. If you want to iterate over keys, -values, or key/value pairs, you can explicitly call the -``iterkeys()``, ``itervalues()``, or ``iteritems()`` -methods to get an appropriate iterator. +Note that the order is essentially random, because it's based on the +hash ordering of the objects in the dictionary. + +Applying ``iter()`` to a dictionary always loops over the keys, but +dictionaries have methods that return other iterators. If you want to +iterate over keys, values, or key/value pairs, you can explicitly call +the ``iterkeys()``, ``itervalues()``, or ``iteritems()`` methods to +get an appropriate iterator. The ``dict()`` constructor can accept an iterator that returns a finite stream of ``(key, value)`` tuples:: @@ -317,10 +335,15 @@ List comprehensions ----------------------- +.. comment + + Maybe gencomps should be described and emphasized, and listcomps + be mentioned as an afterthought. + Two common operations on a stream are performing some operation for every element, or selecting a subset of elements that meet some condition. For example, given a list of strings, you might want to -pull out all the strings containing a given substring, or strip off +pull out all the strings containing a given substring or strip off trailing whitespace from each line. List comprehensions (or "listcomps") are a concise notation for such @@ -340,9 +363,8 @@ containing the resulting lines, not an iterator. This means that list comprehensions aren't very useful if you're working with iterators that return an infinite or a very large stream of data. Later we'll -discuss generator expressions, a feature that provides similar -capabilities as list comprehensions but can be used with infinite -iterators. +discuss generator expressions, which have the capabilities of list +comprehensions but can be used with infinite iterators. List comprehensions have the form:: @@ -403,8 +425,8 @@ ----------------------- Generator expressions are written like list comprehensions, but are -surrounded by parentheses (\samp{()}) and not square brackets -(\samp{[]}). The result of a generator expression +surrounded by parentheses ("()") and not square brackets +("[]"). The result of a generator expression is an iterator that returns the computed elements without materializing a list containing them all. @@ -520,7 +542,7 @@ -New generator features in Python 2.5 +Passing values into a generator '''''''''''''''''''''''''''''''''''''''''''''' In Python 2.4 and earlier, generators only produced output. Once a @@ -625,8 +647,7 @@ Built-in functions ---------------------------------------------- -Let's look in more detail at those built-in functions that are -relevant to iterators. +Let's look in more detail at built-in functions often used with iterators. Two Python's built-in functions, ``map()`` and ``filter()``, are somewhat obsolete; they duplicate the features of list comprehensions @@ -701,7 +722,7 @@ If you use ``operator.add`` with ``reduce()``, you'll add up all the elements of the iterable. This case is so common that there's a special -built-in for it:: +built-in called ``sum()`` to compute it:: reduce(operator.add, [1,2,3,4], 0) => 10 @@ -710,8 +731,20 @@ sum([]) => 0 -``enumerate(iter)`` counts off the elements in the iterable, return -pairs of the count and each element. +For many uses of ``reduce()``, though, it can be clearer to just write +the obvious ``for`` loop:: + + # Instead of: + product = reduce(operator.mul, [1,2,3], 1) + + # You can write: + product = 1 + for i in [1,2,3]: + product *= i + + +``enumerate(iter)`` counts off the elements in the iterable, returning +2-tuples containing the count and each element. :: @@ -724,7 +757,7 @@ f = open('data.txt', 'r') for i, line in enumerate(f): if line.strip() == '': - print 'Blank line at', line + print 'Blank line at line #%i' % i ``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the elements of the iterable into a list, sorts @@ -745,12 +778,12 @@ [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769] (For a more detailed discussion of sorting, see the Sorting mini-HOWTO -in the Python wiki at \url{http://wiki.python.org/moin/SortingHowto}.) +in the Python wiki at http://wiki.python.org/moin/SortingHowto.) The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an iterable's contents. ``any()`` returns True if any element in the iterable is a true value, and ``all()`` -retturn True if all of the elements are true values:: +returns True if all of the elements are true values:: any([0,1,0]) => True @@ -770,16 +803,13 @@ ---------------------------------------------- When writing functional-style programs, you'll often need little -functions that act as predicates or that combine list elements in some -particular way. +functions that act as predicates or that combine elements in some way. -If there's a Python built-in that's suitable, or a module has a -function that does what you need, you don't need to define a new -function at all:: +If there's a Python built-in or a module function that's suitable, you +don't need to define a new function at all:: stripped_lines = [line.strip for line in lines] existing_files = filter(os.path.exists, file_list) - XXX pi-notation example If the function you need doesn't exist, you need to write it. One way to write small functions is to use the ``lambda`` statement. ``lambda`` @@ -804,7 +834,8 @@ def adder(x,y): return x + y -Which alternative is preferable? That's a style question. +Which alternative is preferable? That's a style question; my general +view is to avoid it. ``lambda`` is quite limited in the functions it can define. The result has to be computable as a single expression, which means you @@ -815,7 +846,22 @@ :: - freq = reduce(lambda a, b: (0, a[1] + b[1]), items)[1] + freq = reduce(lambda a, b: (0, a[1] + b[1]), items)[1] + +You can figure it out, but it takes time to disentangle the function +to figure out what's going on. Using a short nested +``def`` statements makes things a little bit better:: + + def combine (a, b): + return 0, a[1] + b[1] + + return reduce(combine_freq, items)[1] + +It would be best of all if I had simply used a ``for`` loop:: + + total = 0 + for a, b in items: + total += b Fredrik Lundh once suggested the following set of rules for refactoring uses of ``lambda``: @@ -827,28 +873,16 @@ 4) Convert the lambda to a def statement, using that name. 5) Remove the comment. -Personally I try to avoid lambdas, favoring short nested -``def`` statements like this:: - - def output_total_freq (items): - """Takes a list of (element, frequency count) tuples - and returns the total number of occurrences. - """ - - def combine (a, b): - return 0, a[1] + b[1] - - return reduce(combine_freq, items)[1] - -You might disagree that this style is better. +I really like these rules, but you're free todisagree that this style +is better. The itertools module ----------------------- The ``itertools`` module contains a number of commonly-used iterators -as well as functions for combining iterators. This section will -introduce the module's contents using small examples. +as well as functions for combining several iterators. This section +will introduce the module's contents by showing small examples. ``itertools.count(n)`` returns an infinite stream of integers, increasing by 1 each time. You can optionally supply the @@ -861,7 +895,7 @@ ``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable and returns a new iterator that returns its elements from -first to last, and then repeats these elements infinitely. +first to last. The new iterator will repeat these elements infinitely. :: @@ -878,10 +912,10 @@ itertools.repeat('abc', 5) => abc, abc, abc, abc, abc -``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of -iterables as input, and returns all the elements of the first iterator, -then all the elements of the second, until all of the iterables -have been exhausted. +``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of +iterables as input, and returns all the elements of the first +iterator, then all the elements of the second, and so on, until all of +the iterables have been exhausted. :: @@ -895,25 +929,25 @@ ('a', 1), ('b', 2), ('c', 3) This iterator is intended to be used with iterables that are all of -the same length. If the iterables are of different lengths, -the resulting stream will be the same length as the shortest iterable. -Unfortunately, if you passed in any iterators as arguments, -an element may have been taken from the longer iterators -and discarded, meaning that you can't keep using them; if you do, -you risk skipping a discarded element. +the same length. If the iterables are of different lengths, the +resulting stream will be the same length as the shortest iterable. :: itertools.izip(['a', 'b'], (1, 2, 3)) => ('a', 1), ('b', 2) +You should avoid doing this, though, because an element may be taken +from the longer iterators and discarded. This means you can't go on +to use the iterators further because you risk skipping a discarded +element. ``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a slice of the iterator. It can return the first ``stop`` elements. If you supply a starting index, you'll get ``stop-start`` elements, and if you supply a value for ``step` elements will be -skipped. Unlike Python's string and list slicing, you can't use -negative values for ``start``, ``stop``, or ``step``. +skipped accordingly. Unlike Python's string and list slicing, you +can't use negative values for ``start``, ``stop``, or ``step``. :: @@ -943,7 +977,7 @@ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... -Two functions are for calling other functions on the contents of an +Two functions are used for calling other functions on the contents of an iterable. ``itertools.imap(f, iterA, iterB, ...)`` returns @@ -953,13 +987,13 @@ itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) => 6, 8, 8 -(The ``operator`` module contains a set of functions +The ``operator`` module contains a set of functions corresponding to Python's operators. Some examples are ``operator.add(a, b)`` (adds two values), ``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')`` (returns a callable that -fetches the ``"id"`` attribute), +fetches the ``"id"`` attribute). ``itertools.starmap(func, iter)`` assumes that the iterable will return a stream of tuples, and calls ``f()`` using these tuples as the @@ -1017,10 +1051,11 @@ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... -The last function, ``itertools.groupby(iter, key_func=None)``, is the -most complicated. ``key_func(elem)`` is a function that can compute a -key value for each element returned by the iterable. If you don't -supply a key function, the key is simply each element itself. +The last function I'll discuss, ``itertools.groupby(iter, +key_func=None)``, is the most complicated. ``key_func(elem)`` is a +function that can compute a key value for each element returned by the +iterable. If you don't supply a key function, the key is simply each +element itself. ``groupby()`` collects all the consecutive elements from the underlying iterable that have the same key value, and returns a stream @@ -1068,7 +1103,7 @@ For programs written in a functional style, you'll sometimes want to construct variants of existing functions that have some of the parameters filled in. Consider a Python function ``f(a, b, c)``; you -could create a new function ``g(b, c)`` that was equivalent to +may wish to create a new function ``g(b, c)`` that was equivalent to ``f(1, b, c)``. This is called "partial function application". The constructor for ``partial`` takes the arguments ``(function, arg1, @@ -1089,15 +1124,6 @@ server_log('Unable to open socket') -Topics to place ------------------------------ - -XXX os.walk() - -XXX Need a large example. - -======= - Acknowledgements ------------------------------ @@ -1108,6 +1134,19 @@ .. comment + Topics to place + ----------------------------- + + XXX os.walk() + + XXX Need a large example. + + But will an example add much? I'll post a first draft and see + what the comments say. + +.. comment + + Original outline: Introduction Idea of FP Programs built out of functions @@ -1141,6 +1180,9 @@ .. comment + Handy little function for printing part of an iterator -- used + while writing this document. + import itertools def print_iter(it): slice = itertools.islice(it, 10) @@ -1153,9 +1195,30 @@ References -------------------- -SICP +General +''''''''''''''' + +**Structure and Interpretation of Computer Programs**, by +Harold Abelson and Gerald Jay Sussman with Julie Sussman. + +Full text at http://mitpress.mit.edu/sicp/. + +A classic textbook of computer science. Chapters 2 and 3 discuss the +use of sequences and streams to organize the data flow inside a +program. The book uses Scheme for its examples, but many of the +design approaches described in these chapters are applicable to +functional-style Python code. + +http://en.wikipedia.org/wiki/Functional_programming: +General Wikipedia entry describing functional programming. + + +Python documentation +''''''''''''''''''''''''''' -Relevant manual sections +http://docs.python.org/lib/module-itertools.html: +Documentation ``for the itertools`` module. -XXX +http://docs.python.org/lib/module-operator.html: +Documentation ``for the operator`` module.
participants (1)
-
andrew.kuchling