[FAQTS] Python Knowledge Base Update -- October 9th, 2000

Fiona Czuczman Fiona Czuczman <fiona@sitegnome.com>
9 Oct 2000 09:59:52 -0000

Hi All,

It's been a while.  Sorry, I've taken a full time job which doesn't 
give me much time to play with the knowledge base ..

This post is simply what's happened since my last post.


Fiona Czuczman

Unanswered Questions :

- How do I change the name of a process (as viewed by 'ps') from Python?
- How can I get my _full_ program name (eg. "mylongpythonscript.py") 
under eg. NT? sys.argv[0] gives me the truncated MS-DOS 
style, "MYLONG~1.PY" etc.
- What are the criteria I should consider when deciding between Numeric 
Python, Matlab or Octave?
- getting full program name via sys.argv[0] on (eg.) NT?
- is there code that sends the contents of a directory as a multi-part 
mime message to e-mail recipients?

Answered Questions :

- Alternative to os.path.walk
- How do I find reference count errors?
- Python's big, is there a crisp overview? a quick reference card? a 
bare bones documentation?
- Of what use is 'lambda'?
- How do I handle command line args with gnome?

## Unanswered Questions ########################################

How do I change the name of a process (as viewed by 'ps') from Python?
Adam Feuer

How can I get my _full_ program name (eg. "mylongpythonscript.py") under eg. NT? sys.argv[0] gives me the truncated MS-DOS style, "MYLONG~1.PY" etc.
Jon Nicoll

What are the criteria I should consider when deciding between Numeric Python, Matlab or Octave?
Louis Luang

getting full program name via sys.argv[0] on (eg.) NT?
Jon Nicoll

is there code that sends the contents of a directory as a multi-part mime message to e-mail recipients?
Wolfgang Lipp

## New Entries #################################################

Alternative to os.path.walk
Daniel Dittmar

If you have trouble specifying the correct callback to os.path.walk and 
would prefer to use an iterator as in 

for fname in RecursiveFileIterator ('dir1', 'dir2'):
   process (fname)

class RecursiveFileIterator:
    def __init__ (self, *rootDirs):
        self.dirQueue = list (rootDirs)
        self.includeDirs = None
        self.fileQueue = []

    def __getitem__ (self, index):
        while len (self.fileQueue) == 0:
            self.nextDir ()
        result = self.fileQueue [0]
        del self.fileQueue [0]
        return result

    def nextDir (self):
        dir = self.dirQueue [0]   # fails with IndexError, which is fine
                                  # for iterator interface
        del self.dirQueue [0]
        list = os.listdir (dir)
        join = os.path.join
        isdir = os.path.isdir
        for basename in list:
            fullPath = join (dir, basename)
            if isdir (fullPath):
                self.dirQueue.append (fullPath)
                if self.includeDirs:
                    self.fileQueue.append (fullPath)
                self.fileQueue.append (fullPath)

How do I find reference count errors?
Joseph VanAndel
Will Ware <wware@world.std.com>

(Will Ware posted this on comp.lang.python)

Numerous postings to c.l.python, including some of my own, have
mentioned the difficulty of debugging memory leaks in C extensions. To
old hands this stuff may all be obvious, but we less experienced folk
need all the help we can get. This is a helpful trick I've developed
using the 1.5.2 codebase. I imagine it would work with the 1.6 and 2.0
codebases as well.

The typical problem with memory leaks is mismanagement of reference
counts, particularly abuses of Py_INCREF and Py_DECREF, as well as
ignorance of the refcount effects of functions like Py_BuildValue,
PyArg_ParseTuple, PyTuple/List_SetItem/GetItem, and so forth. The
existing codebase offers some help with this (search for
Py_TRACE_REFS) but I found it useful to add this function in
Objects/object.c, just before _Py_PrintReferences.

        FILE *fp;
        int n;
        PyObject *op;
        for (n = 0, op = refchain._ob_next;
             op != &refchain;
             op = op->_ob_next, /*n++*/ n += op->ob_refcnt);
        fprintf(fp, "%d refs\n", n);

The difference between this and _Py_PrintReferences is that the
latter prints out all objects and their refcounts, in a list that
runs many pages. It's obviously impractical to do that at several
points in a program that runs in a long loop. But this function
will only print the total of all the refcounts in the system, which
will allow you to keep track of when you have inadvertently put in
something that increases the total refcount every time going thru a

In my C extension, I put in the following macros.

#if defined(Py_DEBUG) || defined(DEBUG)
extern void _Py_CountReferences(FILE*);
#define CURIOUS(x) { fprintf(stderr, __FILE__ ":%d ", __LINE__); x; }
#define CURIOUS(x)
#define MARKER()        CURIOUS(fprintf(stderr, "\n"))
#define DESCRIBE(x)     CURIOUS(fprintf(stderr, "  " #x "=%d\n", x))
#define DESCRIBE_HEX(x) CURIOUS(fprintf(stderr, "  " #x "=%08x\n", x))
#define COUNTREFS()     CURIOUS(_Py_CountReferences(stderr))

To debug, I rebuild Python using 'make OPT="-DPy_DEBUG"', which
causes the stuff under Py_TRACE_REFS to be built. My own makefile
uses the same trick, by including these lines:

        make clean; make OPT="-g -DPy_DEBUG" all
CFLAGS = $(OPT) -fpic -O2 -I/usr/local/include -I/usr/include/python1.5

When I suspect that one of my functions is responsible for a memory
leak, I liberally sprinkle it with calls to the COUNTREFS() macro.
This allows me to keep track of exactly how many references are being
created or destroyed as I go through my function. This is particularly
useful in loops where simple mistakes can cause reference counts to
grow ridiculously fast. Also, reference counts that shrink too fast
(overzealous use of Py_DECREF) can cause core dumps because the memory
for objects that should still exist has been reallocated for new

Python's big, is there a crisp overview? a quick reference card? a bare bones documentation?
Wolfgang Lipp, Fiona Czuczman
Richard Gruet, rgruet@ina.fr; Chris Hoffmann, choffman@vicorp.com; Ken Manheimer, ken.manheimer@nist.gov; Guido van Rossum, guido@CNRI.Reston.Va.US, guido@python.org; Tim Peters, tim_one@email.msn.com; and the readers of comp.lang.python

yes, there is. all of it in one html page, have a look @


Of what use is 'lambda'?
Joseph VanAndel
Kragen Sitaker <kragen@pobox.com>

- it is useful to bind arguments, e.g. 
	lambda x, y=zz: do_something(x,y) 

	--- which can be passed as a function in place of do_something.

- why would you want to make an indirect function call in the first

The answer to the second is deep.  

Of course, you don't ever *need* to make an indirect function call; in
procedural programming, you can write code like this:

        if fcode == 1:
                do_thing_one(x, y)
        elif fcode == 2:
                do_thing_two(x, y)

And when you add a new fcode value, you just have to go to every
if-elif-elif-else switch and add a new branch.  This is kind of bad, in
that it scatters information about what fcode==2 means all over the
place, and it's likely that it would be handier to have it all in one
place.  (There's a balance, of course.  Sometimes what fcode==2 means
has more to do with the code the if-elif thing is in the middle of than
with the other branches that also happen when fcode==2.)

With object-oriented programming, you can do something like this

        fcodething.do_something(x, y)

and rely on dynamic method dispatch to do the right thing.  This has
the advantage that you can put all the do_something and
do_something_else methods together.

With functional programming, instead, you say:

        fcode(x, y)

and just use indirect function invocation to do the right thing.
Closures (like lambda x, y, z=foo: do_something(x, y, z)) make it
possible to store state in an invocable function, which means that
functions (closures, lambdas) become equivalent to objects.

There are times when functional code --- passing around lambdas --- is
clearer, and there are arguably times when object-oriented code is
clearer.  Functional code seems to me more likely to be more flexible,
but that's kind of a fuzzy statement and could be nonsense.

Closures are sort of equivalent to objects with a single method.  If
you want to do multiple methods, you have to use some parameter to
decide which one to invoke.  This makes it hard to go from a
"single-method" function to a "multiple-method" function --- you have
to change every call to it.  You can do prototype-based "inheritance"
by delegation --- call some "prototype" function if you don't
understand the requested method.  Here's an example, albeit a specious
one, since Python has stacks built in:

def stack(op, arg, data, prototype):
        if op == 'push':
        elif op == 'tos':
                return data[len(data) - 1]
        elif op == 'pop':
                return prototype(op, arg)

def make_stack(prototype=lambda op, arg: None):
        return (lambda op, arg=None, data=[], prototype=prototype: 
                        stack(op, arg, data, prototype))

>>> x = make_stack()
>>> x('push', 3)
>>> x('tos')
>>> x('push', 4)
>>> x('tos')
>>> x('pop')
>>> x('tos')
>>> x('pop')
>>> x('tos')
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in <lambda>
  File "<stdin>", line 5, in stack
IndexError: list index out of range
>>> x('unknown')
>>> def hello():
...     print "hello, world"
>>> x = make_stack(lambda op, arg: hello())
>>> x('push', 37)
>>> x('tos')
>>> x('unknown arg')
hello, world

Just as you can create a lambda that acts like an object, you can
create an object that emulates a lambda, simply by creating an object
with only one method.  Both are kind of awkward.  (It gets a lot worse
when you start trying to do multiple inheritance with lambdas, closures
with objects, etc.)

(Also, even without closures, function pointers allow you to simulate
objects in languages that really have only structs: C, JavaScript, and
Lua come to mind, and of these, JavaScript and Lua have syntactic sugar
to make it easier.  In C, you have to say things like 
obj->method(obj, arg).)

So both approaches are equally powerful; but both of them are more
convenient for expressing certain kinds of algorithms.  

Lambdas seem to be especially convenient for expressing non-mutative
stateless algorithms: map, filter, zip.  Objects seem to be especially
convenient for writing things with lots of state that can be peacefully
extended later on.

Lambdas make it possible to have really private attributes, even in
libertine[0] languages like Perl, JavaScript, and Python.  Well, maybe
not in Python --- I don't know it well enough yet, but I wouldn't be
surprised if there were a way to get at default argument values ;)

In Perl (and, I think, JavaScript), you can use closures to have really
private methods, too.  This has the unfortunate disadvantage that if
the methods are potentially recursive, the closures will have circular
references and screw up the reference-counting garbage collection, so
you have to deallocate things by hand or leak memory.

Sometimes it's clearer to write a function in-line in a lambda, even
when you're not binding any data into a closure.  

"sort" is the usual example; you could define an interface (or
protocol) "comparator" consisting of a method "compare" which compared
two objects in some user-defined way, and then specify your sort
routine to require an object implementing "comparator" which compares
objects.  This has the disadvantage that the order you want the data in
is specified in a different class from the one that wants it sorted,
which often means that it's actually in a different file --- by
convention or by language requirement.

Defining a comparator function is often much easier --- you can usually
put it in the same file.  But, in many languages, you still have to put
it outside of the routine that wants the sorting done --- which means
that it's ten, twenty, forty lines away from the sort call.

Being able to write the function in-line in a lambda keeps related
things together and unrelated things apart, which makes your code
easier to read, easier to write, easier to find bugs in, and easier to

How do I handle command line args with gnome?
Joseph VanAndel

You need to remove from sys.argv the options you processed before
invoking the command line option parser from gnome.  E.g:

import sys,getopt

usageStr =\
"""-i input_dir [-o output_dir]  -r [-- gnome_args]
   -r specifies real_time mode (default is batch processing)
# we have to parse arguments before importing gnome.ui, which also wants
# to parse arguments.  
if __name__ == "__main__":
    global optlist 
        optlist, args = getopt.getopt(sys.argv[1:], 'i:o:r')
        detail = sys.exc_info()
        print detail[0], detail[1]
        print __doc__
        print 'usage : %s %s' % (sys.argv[0], usageStr)

    # pass remaining arguments to gnome.ui
    sys.argv = list((sys.argv[0],))
    for a in args:

from gnome.ui  import *
from gtk import * 
from gnome.config import *

# Applications arguments can now be processed from 'optlist'.