forking and multiple executions of code

Roy S. Rapoport googlenews at ols.inorganic.org
Wed May 14 01:38:01 EDT 2003


My apologies for the stupid subject line -- it's a bit difficult to
explain what's going on here, but in summary:  Python's hurting my
brain.

I've got class Task defined thus:
---
import os
import time
class Task:
    def __init__(self, name):
        self.name = name
    def run(self):
        print "Task: Entering %s.run." % self.name
        self.__reallyrun()
    def __reallyrun(self):
        print "Task: Now in %s.__reallyrun" % self.name
        try:
            pid = os.fork()
        except OSError, e:
            print >>sys.stderr, "fork #1 failed: %d (%s)" %
(e.errno,e.strerror)
            os.exit(2)
        if (pid > 0):
            print "Task: %s Succeeded in forking; pid is %d" %
(self.name, pid)
            return
        print "Task: %s is in forked version of reality." % self.name
---
and class TaskList defined thus:
---
from Task import *
class TaskList:    
    def __init__ (self):
        self.task = {}
    def add(self, name): 
        t = Task(name)
        self.task[name] = t        
    def run(self):
        print "There are %d tasks to run." % len(self.task)
        for k in self.task.keys():
            task = self.task[k]
            print "TaskList: Running task %s now." % k
            task.run()
---

And finally, I'm executing this code:
---
#! /usr/local/bin/python
from TaskList import *
t = TaskList()
t.add("walrus")
t.add("puma")
t.run()
---

In other words, TaskList is a container of Tasks (using the self.task
dictionary); Task, when it self.run's, calls self.__reallyrun, forks,
exits if it's the parent process, or prints a message and exits. 
Relatively easy; my calling code creates three tasks and then has
TaskList (t) run them.

Here's where it gets interesting.  Depending on the number of tasks
added and on where the output's going, this program executes each task
various times.

For example, simply executing this program in my terminal with one
task, I get:
---
There are 1 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma is in forked version of reality.
Task: puma Succeeded in forking; pid is 724
---
(this is correct).

But if I execute this program with output to a file (simple.py > foo),
the file contains
---
There are 1 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma is in forked version of reality.
There are 1 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma Succeeded in forking; pid is 735
---
Which is actually interesting on three fronts:  A) TaskList is running
puma twice; B) puma's actually only forking once for some reason; and
C) I'm getting different results because I'm piping to a file.

Now, increasing the number of tasks to two, I get:
To terminal:
---
There are 2 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma is in forked version of reality.
TaskList: Running task walrus now.
Task: Entering walrus.run.
Task: Now in walrus.__reallyrun
Task: walrus is in forked version of reality.
Task: puma Succeeded in forking; pid is 794
TaskList: Running task walrus now.
Task: Entering walrus.run.
Task: Now in walrus.__reallyrun
Task: walrus Succeeded in forking; pid is 795
Task: walrus is in forked version of reality.
Task: walrus Succeeded in forking; pid is 796
---
(So it runs puma once and walrus twice; both walrus executions fork
appropriately).

To file:
---
There are 2 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma is in forked version of reality.
TaskList: Running task walrus now.
Task: Entering walrus.run.
Task: Now in walrus.__reallyrun
Task: walrus is in forked version of reality.
There are 2 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma is in forked version of reality.
TaskList: Running task walrus now.
Task: Entering walrus.run.
Task: Now in walrus.__reallyrun
Task: walrus Succeeded in forking; pid is 809
There are 2 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma Succeeded in forking; pid is 808
TaskList: Running task walrus now.
Task: Entering walrus.run.
Task: Now in walrus.__reallyrun
Task: walrus is in forked version of reality.
There are 2 tasks to run.
TaskList: Running task puma now.
Task: Entering puma.run.
Task: Now in puma.__reallyrun
Task: puma Succeeded in forking; pid is 808
TaskList: Running task walrus now.
Task: Entering walrus.run.
Task: Now in walrus.__reallyrun
Task: walrus Succeeded in forking; pid is 810
---

It looks like TaskList.run is called multiple times (four,
specifically).  In fact, it looks like there's actually a formula
there -- with three tasks, outputing to terminal, it runs puma once,
walrus twice, and warthog four times.  Output to file, and it executes
each of them eight times; in other words:

Given output to terminal and tasks 1...x 
it will execute task N 2^(N-1) times (so one task is executed once,
and one task is executed 2^(x-1) times)
Given output to file and tasks 1...x
It will execute *all* tasks 2^x times

And I have no clue why.  

This is with 2.2.2, by the way, on both Solaris 9 and OpenBSD 3.2.

Apologies for the overly-verbose post -- I just wanted to make sure I
included all the relevant information (and, unfortunately, some of my
conclusions, conjectures, guesses, and statements pulled out of thin
air).

Any ideas what's going on here?

-roy




More information about the Python-list mailing list