Mailman 3 Python Language FAQ - Section 6 - Python-announce-list

July 7, 1999

      This FAQ newsgroup posting has been automatically converted from an
HTML snapshot of the original Python FAQ; please refer to the original
"Python FAQ Wizard" at <http://grail.cnri.reston.va.us/cgi-bin/faqw.py>
if source code snippets given in this document do not work - incidentally
some formatting information may have been lost during the conversion.

----------------------------------------------------------------------------

The whole Python FAQ - Section 6

Last changed on Mon Jun 28 19:36:09 1999 EDT

(Entries marked with ** were changed within the last 24 hours; entries
marked with * were changed within the last 7 days.)

----------------------------------------------------------------------------

6. Python's design

6.1.  Why isn't there a switch or case statement in Python?
6.2.  Why does Python use indentation for grouping of statements?
6.3.  Why are Python strings immutable?
6.4.  Why don't strings have methods like index() or sort(), like lists?
6.5.  Why does Python use methods for some functionality (e.g.
      list.index()) but functions for other (e.g. len(list))?
6.6.  Why can't I derive a class from built-in types (e.g. lists or
      files)?
6.7.  Why must 'self' be declared and used explicitly in method
      definitions and calls?
6.8.  Can't you emulate threads in the interpreter instead of relying on
      an OS-specific thread implementation?
6.9.  Why can't lambda forms contain statements?
6.10. Why don't lambdas have access to variables defined in the
      containing scope?
6.11. Why can't recursive functions be defined inside other functions?
6.12. Why is there no more efficient way of iterating over a dictionary
      than first constructing the list of keys()?
6.13. Can Python be compiled to machine code, C or some other language?
6.14. How does Python manage memory? Why not full garbage collection?
6.15. Why are there separate tuple and list data types?
6.16. How are lists implemented?
6.17. How are dictionaries implemented?
6.18. Why must dictionary keys be immutable?
6.19. How the heck do you make an array in Python?
6.20. Why doesn't list.sort() return the sorted list?
6.21. How do you specify and enforce an interface spec in Python?
6.22. Why do all classes have the same type? Why do instances all have
      the same type?
6.23. Why isn't all memory freed when Python exits?
6.24. Why no class methods or mutable class variables?
6.25. Why are default values sometimes shared between objects?
6.26. Why no goto?
6.27. How do you make a higher order function in Python?
6.28. Why do I get a SyntaxError for a 'continue' inside a 'try'?
6.29. Why can't raw strings (r-strings) end with a backslash?
6.30. Why can't I use an assignment in an expression?

----------------------------------------------------------------------------

6. Python's design

----------------------------------------------------------------------------

6.1. Why isn't there a switch or case statement in Python?

You can do this easily enough with a sequence of if... elif... elif... else.
There have been some proposals for switch statement syntax, but there is no
consensus (yet) on whether and how to do range tests.

----------------------------------------------------------------------------

6.2. Why does Python use indentation for grouping of statements?

Basically I believe that using indentation for grouping is extremely elegant
and contributes a lot to the clarity of the average Python program. Most
people learn to love this feature after a while. Some arguments for it:

Since there are no begin/end brackets there cannot be a disagreement between
grouping perceived by the parser and the human reader. I remember long ago
seeing a C fragment like this:

            if (x <= y)
                    x++;
                    y--;
            z++;

and staring a long time at it wondering why y was being decremented even for
x > y... (And I wasn't a C newbie then either.)

Since there are no begin/end brackets, Python is much less prone to
coding-style conflicts. In C there are loads of different ways to place the
braces (including the choice whether to place braces around single
statements in certain cases, for consistency). If you're used to reading
(and writing) code that uses one style, you will feel at least slightly
uneasy when reading (or being required to write) another style. Many coding
styles place begin/end brackets on a line by themself. This makes programs
considerably longer and wastes valuable screen space, making it harder to
get a good overview over a program. Ideally, a function should fit on one
basic tty screen (say, 20 lines). 20 lines of Python are worth a LOT more
than 20 lines of C. This is not solely due to the lack of begin/end brackets
(the lack of declarations also helps, and the powerful operations of
course), but it certainly helps!

----------------------------------------------------------------------------

6.3. Why are Python strings immutable?

There are two advantages. One is performance: knowing that a string is
immutable makes it easy to lay it out at construction time -- fixed and
unchanging storage requirements. (This is also one of the reasons for the
distinction between tuples and lists.) The other is that strings in Python
are considered as "elemental" as numbers. No amount of activity will change
the value 8 to anything else, and in Python, no amount of activity will
change the string "eight" to anything else. (Adapted from Jim Roskind)

----------------------------------------------------------------------------

6.4. Why don't strings have methods like index() or sort(), like lists?

Good question. Strings currently don't have methods at all (likewise tuples
and numbers). Long ago, it seemed unnecessary to implement any of these
functions in C, so a standard library module "string" written in Python was
created that performs string related operations. Since then, the cry for
performance has moved most of them into the built-in module strop (this is
imported by module string, which is still the preferred interface, without
loss of performance except during initialization). Some of these functions
(e.g. index()) could easily be implemented as string methods instead, but
others (e.g. sort()) can't, since their interface prescribes that they
modify the object, while strings are immutable (see the previous question).

----------------------------------------------------------------------------

6.5. Why does Python use methods for some functionality (e.g. list.index())
but functions for other (e.g. len(list))?

Functions are used for those operations that are generic for a group of
types and which should work even for objects that don't have methods at all
(e.g. numbers, strings, tuples). Also, implementing len(), max(), min() as a
built-in function is actually less code than implementing them as methods
for each type. One can quibble about individual cases but it's really too
late to change such things fundamentally now.

----------------------------------------------------------------------------

6.6. Why can't I derive a class from built-in types (e.g. lists or files)?

This is caused by the relatively late addition of (user-defined) classes to
the language -- the implementation framework doesn't easily allow it. See
the answer to question 4.2 for a work-around. This may be fixed in the
(distant) future.

----------------------------------------------------------------------------

6.7. Why must 'self' be declared and used explicitly in method definitions
and calls?

By asking this question you reveal your C++ background. :-) When I added
classes, this was (again) the simplest way of implementing methods without
too many changes to the interpreter. I borrowed the idea from Modula-3. It
turns out to be very useful, for a variety of reasons.

First, it makes it more obvious that you are using a method or instance
attribute instead of a local variable. Reading "self.x" or "self.meth()"
makes it absolutely clear that an instance variable or method is used even
if you don't know the class definition by heart. In C++, you can sort of
tell by the lack of a local variable declaration (assuming globals are rare
or easily recognizable) -- but in Python, there are no local variable
declarations, so you'd have to look up the class definition to be sure.

Second, it means that no special syntax is necessary if you want to
explicitly reference or call the method from a particular class. In C++, if
you want to use a method from base class that is overridden in a derived
class, you have to use the :: operator -- in Python you can write
baseclass.methodname(self, <argument list>). This is particularly useful for
__init__() methods, and in general in cases where a derived class method
wants to extend the base class method of the same name and thus has to call
the base class method somehow.

Lastly, for instance variables, it solves a syntactic problem with
assignment: since local variables in Python are (by definition!) those
variables to which a value assigned in a function body (and that aren't
explicitly declared global), there has to be some way to tell the
interpreter that an assignment was meant to assign to an instance variable
instead of to a local variable, and it should preferably be syntactic (for
efficiency reasons). C++ does this through declarations, but Python doesn't
have declarations and it would be a pity having to introduce them just for
this purpose. Using the explicit "self.var" solves this nicely. Similarly,
for using instance variables, having to write "self.var" means that
references to unqualified names inside a method don't have to search the
instance's directories.

----------------------------------------------------------------------------

6.8. Can't you emulate threads in the interpreter instead of relying on an
OS-specific thread implementation?

Unfortunately, the interpreter pushes at least one C stack frame for each
Python stack frame. Also, extensions can call back into Python at almost
random moments. Therefore a complete threads implementation requires thread
support for C.

----------------------------------------------------------------------------

6.9. Why can't lambda forms contain statements?

Python lambda forms cannot contain statements because Python's syntactic
framework can't handle statements nested inside expressions.

However, in Python, this is not a serious problem. Unlike lambda forms in
other languages, where they add functionality, Python lambdas are only a
shorthand notation if you're too lazy to define a function.

Functions are already first class objects in Python, and can be declared in
a local scope. Therefore the only advantage of using a lambda form instead
of a locally-defined function is that you don't need to invent a name for
the function -- but that's just a local variable to which the function
object (which is exactly the same type of object that a lambda form yields)
is assigned!

----------------------------------------------------------------------------

6.10. Why don't lambdas have access to variables defined in the containing
scope?

Because they are implemented as ordinary functions. See question 4.5 above.

----------------------------------------------------------------------------

6.11. Why can't recursive functions be defined inside other functions?

See question 4.5 above. But actually recursive functions can be defined in
other functions with some trickery.

        def test():
            class factorial:
                 def __call__(self, n):
                     if n<=1: return 1
                     return n * self(n-1)
            return factorial()

        fact = test()

The instance created by factorial() above acts like the recursive factorial
function.

Mutually recursive functions can be passed to each other as arguments.

----------------------------------------------------------------------------

6.12. Why is there no more efficient way of iterating over a dictionary than
first constructing the list of keys()?

Have you tried it? I bet it's fast enough for your purposes! In most cases
such a list takes only a few percent of the space occupied by the
dictionary. Apart from the fixed header, the list needs only 4 bytes (the
size of a pointer) per key. A dictionary uses 12 bytes per key plus between
30 and 70 percent hash table overhead, plus the space for the keys and
values. By necessity, all keys are distinct objects, and a string object
(the most common key type) costs at least 20 bytes plus the length of the
string. Add to that the values contained in the dictionary, and you see that
4 bytes more per item really isn't that much more memory...

A call to dict.keys() makes one fast scan over the dictionary (internally,
the iteration function does exist) copying the pointers to the key objects
into a pre-allocated list object of the right size. The iteration time isn't
lost (since you'll have to iterate anyway -- unless in the majority of cases
your loop terminates very prematurely (which I doubt since you're getting
the keys in random order).

I don't expose the dictionary iteration operation to Python programmers
because the dictionary shouldn't be modified during the entire iteration --
if it is, there's a small chance that the dictionary is reorganized because
the hash table becomes too full, and then the iteration may miss some items
and see others twice. Exactly because this only occurs rarely, it would lead
to hidden bugs in programs: it's easy never to have it happen during test
runs if you only insert or delete a few items per iteration -- but your
users will surely hit upon it sooner or later.

----------------------------------------------------------------------------

6.13. Can Python be compiled to machine code, C or some other language?

Not easily. Python's high level data types, dynamic typing of objects and
run-time invocation of the interpreter (using eval() or exec) together mean
that a "compiled" Python program would probably consist mostly of calls into
the Python run-time system, even for seemingly simple operations like "x+1".
Thus, the performance gain would probably be minimal.

Internally, Python source code is always translated into a "virtual machine
code" or "byte code" representation before it is interpreted (by the "Python
virtual machine" or "bytecode interpreter"). In order to avoid the overhead
of parsing and translating modules that rarely change over and over again,
this byte code is written on a file whose name ends in ".pyc" whenever a
module is parsed (from a file whose name ends in ".py"). When the
corresponding .py file is changed, it is parsed and translated again and the
.pyc file is rewritten.

There is no performance difference once the .pyc file has been loaded (the
bytecode read from the .pyc file is exactly the same as the bytecode created
by direct translation). The only difference is that loading code from a .pyc
file is faster than parsing and translating a .py file, so the presence of
precompiled .pyc files will generally improve start-up time of Python
scripts. If desired, the Lib/compileall.py module/script can be used to
force creation of valid .pyc files for a given set of modules.

Note that the main script executed by Python, even if its filename ends in
.py, is not compiled to a .pyc file. It is compiled to bytecode, but the
bytecode is not saved to a file.

If you are looking for a way to translate Python programs in order to
distribute them in binary form, without the need to distribute the
interpreter and library as well, have a look at the freeze.py script in the
Tools/freeze directory. This creates a single binary file incorporating your
program, the Python interpreter, and those parts of the Python library that
are needed by your program. Of course, the resulting binary will only run on
the same type of platform as that used to create it.

----------------------------------------------------------------------------

6.14. How does Python manage memory? Why not full garbage collection?

The details of Python memory management depend on the implementation. The
standard Python implementation (the C implementation) uses reference
counting memory management. This means that when an object is no longer in
use Python frees the object automatically, with a few exceptions.

On the other hand, JPython relies on the Java runtime; so it uses the JVM's
garbage collector. This difference can cause some subtle porting problems if
your Python code depends on the behavior of the reference counting
implementation.

Two exceptions to bear in mind for standard Python are:

1) if the object lies on a circular reference path it won't be freed unless
the circularities are broken. EG:

           List = [None]
           List[0] = List

List will not be freed unless the circularity (List[0] is List) is broken.
The reason List will not be freed is because although it may become
inaccessible the list contains a reference to itself, and reference counting
only deallocates an object when all references to an object are destroyed.
To break the circular reference path we must destroy the reference, as in

           List[0] = None

So, if your program creates circular references (and if it is long running
and/or consumes lots of memory) it may have to do some explicit management
of circular structures. In many application domains this is needed rarely,
if ever.

2) Sometimes objects get stuck in "tracebacks" temporarily and hence are not
deallocated when you might expect. Clear the tracebacks via

           import sys
           sys.exc_traceback = sys.last_traceback = None

Tracebacks are used for reporting errors and implementing debuggers and
related things. They contain a portion of the program state extracted during
the handling of an exception (usually the most recent exception).

In the absence of circularities and modulo tracebacks, Python programs need
not explicitly manage memory.

It is often suggested that Python could benefit from fully general garbage
collection. It's looking less and less likely that Python will ever get
"automatic" garbage collection (GC). For one thing, unless this were added
to C as a standard feature, it's a portability pain in the ass. And yes, I
know about the Xerox library. It has bits of assembler code for most common
platforms. Not for all. And although it is mostly transparent, it isn't
completely transparent (when I once linked Python with it, it dumped core).

"Proper" GC also becomes a problem when Python gets embedded into other
applications. While in a stand-alone Python it may be fine to replace the
standard malloc() and free() with versions provided by the GC library, an
application embedding Python may want to have its own substitute for
malloc() and free(), and may not want Python's. Right now, Python works with
anything that implements malloc() and free() properly.

In JPython, which has garbage collection, the following code (which is fine
in C Python) will probably run out of file descriptors long before it runs
out of memory:

            for file in <very long list of files>:
                    f = open(file)
                    c = f.read(1)

Using the current reference counting and destructor scheme, each new
assignment to f closes the previous file. Using GC, this is not guaranteed.
Sure, you can think of ways to fix this. But it's not off-the-shelf
technology. If you want to write code that will work with any Python
implementation, you should explicitly close the file; this will work
regardless of GC:

           for file in <very long list of files>:
                    f = open(file)
                    c = f.read(1)
                    f.close()

All that said, somebody has managed to add GC to Python using the GC library
fromn Xerox, so you can see for yourself. See

            http://starship.python.net/crew/gandalf/gc-ss.html

See also question 4.17 for ways to plug some common memory leaks manually.

If you're not satisfied with the answers here, before you post to the
newsgroup, please read this summary of past discussions on GC for Python by
Moshe Zadka:

            http://www.geocities.com/TheTropics/Island/2932/gcpy.html

----------------------------------------------------------------------------

6.15. Why are there separate tuple and list data types?

This is done so that tuples can be immutable while lists are mutable.

Immutable tuples are useful in situations where you need to pass a few items
to a function and don't want the function to modify the tuple; for example,

            point1 = (120, 140)
            point2 = (200, 300)
            record(point1, point2)
            draw(point1, point2)

You don't want to have to think about what would happen if record() changed
the coordinates -- it can't, because the tuples are immutable.

On the other hand, when creating large lists dynamically, it is absolutely
crucial that they are mutable -- adding elements to a tuple one by one
requires using the concatenation operator, which makes it quadratic in time.

As a general guideline, use tuples like you would use structs in C or
records in Pascal, use lists like (variable length) arrays.

----------------------------------------------------------------------------

6.16. How are lists implemented?

Despite what a Lisper might think, Python's lists are really variable-length
arrays. The implementation uses a contiguous array of references to other
objects, and keeps a pointer to this array (as well as its length) in a list
head structure.

This makes indexing a list (a[i]) an operation whose cost is independent of
the size of the list or the value of the index.

When items are appended or inserted, the array of references is resized.
Some cleverness is applied to improve the performance of appending items
repeatedly; when the array must be grown, some extra space is allocated so
the next few times don't require an actual resize.

----------------------------------------------------------------------------

6.17. How are dictionaries implemented?

Python's dictionaries are implemented as resizable hash tables.

Compared to B-trees, this gives better performance for lookup (the most
common operation by far) under most circumstances, and the implementation is
simpler.

----------------------------------------------------------------------------

6.18. Why must dictionary keys be immutable?

The hash table implementation of dictionaries uses a hash value calculated
from the key value to find the key. If the key were a mutable object, its
value could change, and thus its hash could change. But since whoever
changes the key object can't tell that is incorporated in a dictionary, it
can't move the entry around in the dictionary. Then, when you try to look up
the same object in the dictionary, it won't be found, since its hash value
is different; and if you try to look up the old value, it won't be found
either, since the value of the object found in that hash bin differs.

If you think you need to have a dictionary indexed with a list, try to use a
tuple instead. The function tuple(l) creates a tuple with the same entries
as the list l.

Some unacceptable solutions that have been proposed:

- Hash lists by their address (object ID). This doesn't work because if you
construct a new list with the same value it won't be found; e.g.,

      d = {[1,2]: '12'}
      print d[[1,2]]

will raise a KeyError exception because the id of the [1,2] used in the
second line differs from that in the first line. In other words, dictionary
keys should be compared using '==', not using 'is'.

- Make a copy when using a list as a key. This doesn't work because the list
(being a mutable object) could contain a reference to itself, and then the
copying code would run into an infinite loop.

- Allow lists as keys but tell the user not to modify them. This would allow
a class of hard-to-track bugs in programs that I'd rather not see; it
invalidates an important invariant of dictionaries (every value in d.keys()
is usable as a key of the dictionary).

- Mark lists as read-only once they are used as a dictionary key. The
problem is that it's not just the top-level object that could change its
value; you could use a tuple containing a list as a key. Entering anything
as a key into a dictionary would require marking all objects reachable from
there as read-only -- and again, self-referential objects could cause an
infinite loop again (and again and again).

There is a trick to get around this if you need to, but use it at your own
risk: You can wrap a mutable structure inside a class instance which has
both a __cmp__ and a __hash__ method.

       class listwrapper:
            def __init__(self, the_list):
                  self.the_list = the_list
            def __cmp__(self, other):
                  return self.the_list == other.the_list
            def __hash__(self):
                  l = self.the_list
                  result = 98767 - len(l)*555
                  for i in range(len(l)):
                       try:
                            result = result + (hash(l[i]) % 9999999) * 1001 + i
                       except:
                            result = (result % 7777777) + i * 333
                  return result

Note that the hash computation is complicated by the possibility that some
members of the list may be unhashable and also by the possibility of
arithmetic overflow.

You must make sure that the hash value for all such wrapper objects that
reside in a dictionary (or other hash based structure), remain fixed while
the object is in the dictionary (or other structure).

Furthermore it must always be the case that if o1 == o2 (ie
o1.__cmp__(o2)==0) then hash(o1)==hash(o2) (ie, o1.__hash__() ==
o2.__hash__()), regardless of whether the object is in a dictionary or not.
If you fail to meet these restrictions dictionaries and other hash based
structures may misbehave!

In the case of listwrapper above whenever the wrapper object is in a
dictionary the wrapped list must not change to avoid anomalies. Don't do
this unless you are prepared to think hard about the requirements and the
consequences of not meeting them correctly. You've been warned!

----------------------------------------------------------------------------

6.19. How the heck do you make an array in Python?

["this", 1, "is", "an", "array"]

Lists are arrays in the C or Pascal sense of the word (see question 6.16).
The array module also provides methods for creating arrays of fixed types
with compact representations (but they are slower to index than lists). Also
note that the Numerics extensions and others define array-like structures
with various characteristics as well.

To get Lisp-like lists, emulate cons cells

        lisp_list = ("like",  ("this",  ("example", None) ) )

using tuples (or lists, if you want mutability). Here the analogue of lisp
car is lisp_list[0] and the analogue of cdr is lisp_list[1]. Only do this if
you're sure you really need to (it's usually a lot slower than using Python
lists).

Think of Python lists as mutable heterogeneous arrays of Python objects (say
that 10 times fast :) ).

----------------------------------------------------------------------------

6.20. Why doesn't list.sort() return the sorted list?

In situations where performance matters, making a copy of the list just to
sort it would be wasteful. Therefore, list.sort() sorts the list in place.
In order to remind you of that fact, it does not return the sorted list.
This way, you won't be fooled into accidentally overwriting a list when you
need a sorted copy but also need to keep the unsorted version around.

As a result, here's the idiom to iterate over the keys of a dictionary in
sorted orted:

            keys = dict.keys()
            keys.sort()
            for key in keys:
                    ...do whatever with dict[key]...

----------------------------------------------------------------------------

6.21. How do you specify and enforce an interface spec in Python?

An interfaces specification for a module as provided by languages such as
C++ and java describes the prototypes for the methods and functions of the
module. Many feel that compile time enforcement of interface specifications
help aid in the construction of large programs. Python does not support
interface specifications directly, but many of their advantages can be
obtained by an appropriate test discipline for components, which can often
be very easily accomplished in Python.

A good test suite for a module can at once provide a regression test and
serve as a module interface specification (even better since it also gives
example usage). Look to many of the standard libraries which often have a
"script interpretation" which provides a simple "self test." Even modules
which use complex external interfaces can often be tested in isolation using
trivial "stub" emulations of the external interface.

An appropriate testing discipline (if enforced) can help build large complex
applications in Python as well as having interface specifications would do
(or better). Of course Python allows you to get sloppy and not do it. Also
you might want to design your code with an eye to make it easily tested.

----------------------------------------------------------------------------

6.22. Why do all classes have the same type? Why do instances all have the
same type?

The Pythonic use of the word "type" is quite different from common usage in
much of the rest of the programming language world. A "type" in Python is a
description for an object's operations as implemented in C. All classes have
the same operations implemented in C which sometimes "call back" to
differing program fragments implemented in Python, and hence all classes
have the same type. Similarly at the C level all class instances have the
same C implementation, and hence all instances have the same type.

Remember that in Python usage "type" refers to a C implementation of an
object. To distinguish among instances of different classes use
Instance.__class__, and also look to 4.47. Sorry for the terminological
confusion, but at this point in Python's development nothing can be done!

----------------------------------------------------------------------------

6.23. Why isn't all memory freed when Python exits?

Objects referenced from Python module global name spaces are not always
deallocated when Python exits.

This may happen if there are circular references (see question 4.17). There
are also certain bits of memory that are allocated by the C library that are
impossible to free (e.g. a tool like Purify will complain about these).

But in general, Python 1.5 and beyond (in contrast with earlier versions) is
quite agressive about cleaning up memory on exit.

If you want to force Python to delete certain things on deallocation use the
sys.exitfunc hook to force those deletions. For example if you are debugging
an extension module using a memory analysis tool and you wish to make Python
deallocate almost everything you might use an exitfunc like this one:

      import sys

      def my_exitfunc():
           print "cleaning up"
           import sys
           # do order dependant deletions here
           ...
           # now delete everything else in arbitrary order
           for x in sys.modules.values():
                d = x.__dict__
                for name in d.keys():
                     del d[name]

      sys.exitfunc = my_exitfunc

Other exitfuncs can be less drastic, of course.

(In fact, this one just does what Python now already does itself; but the
example of using sys.exitfunc to force cleanups is still useful.)

----------------------------------------------------------------------------

6.24. Why no class methods or mutable class variables?

The notation

        instance.attribute(arg1, arg2)

usually translates to the equivalent of

        Class.attribute(instance, arg1, arg2)

where Class is a (super)class of instance. Similarly

        instance.attribute = value

sets an attribute of an instance (overriding any attribute of a class that
instance inherits).

Sometimes programmers want to have different behaviours -- they want a
method which does not bind to the instance and a class attribute which
changes in place. Python does not preclude these behaviours, but you have to
adopt a convention to implement them. One way to accomplish this is to use
"list wrappers" and global functions.

       def C_hello():
             print "hello"

       class C:
            hello = [C_hello]
            counter = [0]

        I = C()

Here I.hello[0]() acts very much like a "class method" and I.counter[0] = 2
alters C.counter (and doesn't override it). If you don't understand why
you'd ever want to do this, that's because you are pure of mind, and you
probably never will want to do it! This is dangerous trickery, not
recommended when avoidable. (Inspired by Tim Peter's discussion.)

----------------------------------------------------------------------------

6.25. Why are default values sometimes shared between objects?

It is often expected that a function CALL creates new objects for default
values. This is not what happens. Default values are created when the
function is DEFINED, that is, there is only one such object that all
functions refer to. If that object is changed, subsequent calls to the
function will refer to this changed object. By definition, immutable objects
(like numbers, strings, tuples, None) are safe from change. Changes to
mutable objects (like dictionaries, lists, class instances) is what causes
the confusion.

Because of this feature it is good programming practice not to use mutable
objects as default values, but to introduce them in the function. Don't
write:

            def foo(dict={}):  # XXX shared reference to one dict for all calls
                ...

but:

            def foo(dict=None):
                    if dict is None:
                            dict = {} # create a new dict for local namespace

See page 182 of "Internet Programming with Python" for one discussion of
this feature. Or see the top of page 144 or bottom of page 277 in
"Programming Python" for another discussion.

----------------------------------------------------------------------------

6.26. Why no goto?

Actually, you can use exceptions to provide a "structured goto" that even
works across function calls. Many feel that exceptions can conveniently
emulate all reasonable uses of the "go" or "goto" constructs of C, Fortran,
and other languages. For example:

       class label: pass # declare a label
       try:
            ...
            if (condition): raise label() # goto label
            ...
       except label: # where to goto
            pass
       ...

This doesn't allow you to jump into the middle of a loop, but that's usually
considered an abuse of goto anyway. Use sparingly.

----------------------------------------------------------------------------

6.27. How do you make a higher order function in Python?

You have two choices: you can use default arguments and override them or you
can use "callable objects." For example suppose you wanted to define
linear(a,b) which returns a function f where f(x) computes the value a*x+b.
Using default arguments:

         def linear(a,b):
             def result(x, a=a, b=b):
                 return a*x + b
             return result

Or using callable objects:

         class linear:
            def __init__(self, a, b):
                self.a, self.b = a,b
            def __call__(self, x):
                return self.a * x + self.b

In both cases:

         taxes = linear(0.3,2)

gives a callable object where taxes(10e6) == 0.3 * 10e6 + 2.

The defaults strategy has the disadvantage that the default arguments could
be accidentally or maliciously overridden. The callable objects approach has
the disadvantage that it is a bit slower and a bit longer. Note however that
a collection of callables can share their signature via inheritance. EG

          class exponential(linear):
             # __init__ inherited
             def __call__(self, x):
                 return self.a * (x ** self.b)

On comp.lang.python, zenin@bawdycaste.org points out that an object can
encapsulate state for several methods in order to emulate the "closure"
concept from functional programming languages, for example:

        class counter:
            value = 0
            def set(self, x): self.value = x
            def up(self): self.value=self.value+1
            def down(self): self.value=self.value-1

        count = counter()
        inc, dec, reset = count.up, count.down, count.set

Here inc, dec and reset act like "functions which share the same closure
containing the variable count.value" (if you like that way of thinking).

----------------------------------------------------------------------------

6.28. Why do I get a SyntaxError for a 'continue' inside a 'try'?

This is an implementation limitation, caused by the extremely simple-minded
way Python generates bytecode. The try block pushes something on the "block
stack" which the continue would have to pop off again. The current code
generator doesn't have the data structures around so that 'continue' can
generate the right code.

Note that JPython doesn't have this restriction!

----------------------------------------------------------------------------

6.29. Why can't raw strings (r-strings) end with a backslash?

More precisely, they can't end with an odd number of backslashes: the
unpaired backslash at the end escapes the closing quote character, leaving
an unterminated string.

Raw strings were designed to ease creating input for processors (chiefly
regular expression engines) that want to do their own backslash escape
processing. Such processors consider an unmatched trailing backslash to be
an error anyway, so raw strings disallow that. In return, they allow you to
pass on the string quote character by escaping it with a backslash. These
rules work well when r-strings are used for their intended purpose.

If you're trying to build Windows pathnames, note that all Windows system
calls accept forward slashes too:

        f = open("/mydir/file.txt") # works fine!

If you're trying to build a pathname for a DOS command, try e.g. one of

        dir = r"\this\is\my\dos\dir" "\\"
        dir = r"\this\is\my\dos\dir\ "[:-1]
        dir = "\\this\\is\\my\\dos\\dir\\"

----------------------------------------------------------------------------

6.30. Why can't I use an assignment in an expression?

Many people used to C or Perl complain that they want to be able to use e.g.
this C idiom:

        while (line = readline(f)) {
            ...do something with line...
        }

where in Python you're forced to write this:

        while 1:
            line = f.readline()
            if not line:
                break
            ...do something with line...

This issue comes up in the Python newsgroup with alarming frequency --
search Deja News for past messages about assignment expression. The reason
for not allowing assignment in Python expressions is a common, hard-to-find
bug in those other languages, caused by this construct:

        if (x = 0) {
            ...error handling...
        }
        else {
            ...code that only works for nonzero x...
        }

Many alternatives have been proposed. Most are hacks that save some typing
but use arbitrary or cryptic syntax or keywords, and fail the simple
criterion that I use for language change proposals: it should intuitively
suggest the proper meaning to a human reader who has not yet been introduced
with the construct.

The earliest time something can be done about this will be with Python 2.0
-- if it is decided that it is worth fixing. An interesting phenomenon is
that most experienced Python programmers recognize the "while 1" idiom and
don't seem to be missing the assignment in expression construct much; it's
only the newcomers who express a strong desire to add this to the language.

One fairly elegant solution would be to introduce a new operator for
assignment in expressions spelled ":=" -- this avoids the "=" instead of
"==" problem. It would have the same precedence as comparison operators but
the parser would flag combination with other comparisons (without
disambiguating parentheses) as an error.

Finally -- there's an alternative way of spelling this that seems attractive
but is generally less robust than the "while 1" solution:

        line = f.readline()
        while line:
            ...do something with line...
            line = f.readline()

The problem with this is that if you change your mind about exactly how you
get the next line (e.g. you want to change it into sys.stdin.readline()) you
have to remember to change two places in your program -- the second one
hidden at the bottom of the loop.

----------------------------------------------------------------------------

-- 
----------- comp.lang.python.announce (moderated) ----------
Article Submission Address:  python-announce@python.org
Python Language Home Page:   http://www.python.org/
Python Quick Help Index:     http://www.python.org/Help.html
------------------------------------------------------------

Python Language FAQ - Section 6

Markus Fleck

tags

participants (1)