Finding the source of an exception in a python multiprocessing program

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Apr 24 18:37:28 EDT 2013


On 24 April 2013 20:25, William Ray Wing <wrw at mac.com> wrote:
> I run a bit of python code that monitors my connection to the greater Internet.  It checks connectivity to the requested target IP addresses, logging both successes and failures, once every 15 seconds.  I see failures quite regularly, predictably on Sunday nights after midnight when various networks are undergoing maintenance.  I'm trying to use python's multiprocessing library to run multiple copies in parallel to check connectivity to different parts of the country (they in no way interact with each other).
>
> On rare occasions (maybe once every couple of months) I get the following exception and traceback:
>
> Traceback (most recent call last):
>   File "./CM_Harness.py", line 12, in <module>
>     Foo = pool.map(monitor, targets)    # and hands off two targets
>   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 227, in map
>     return self.map_async(func, iterable, chunksize).get()
>   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 528, in get
>     raise self._value
> IndexError: list index out of range
>
> The code where the traceback occurs is:
>
> #!/usr/bin/env python
>
> """ Harness to call multiple parallel copies
>     of the basic monitor program
> """
>
> from multiprocessing import Pool
> from Connection_Monitor import monitor
>
> targets = ["8.8.8.8", "www.ncsa.edu"]
> pool = Pool(processes=2)            # start 2 worker processes
> Foo = pool.map(monitor, targets)    # and hands off two targets
>
>
> Line 12, in my code is simply the line that launches the underlying monitor code.  I'm assuming that the real error is occurring in the monitor program that is being launched, but I'm at a loss as to what to do to get a better handle on what's going wrong. Since, as I said, I see failures quite regularly, typically on Sunday nights after midnight when various networks are undergoing maintenance, I don't _think_ the exception is being triggered by that sort of failure.
>
> When I look at the pool module, the error is occurring in get(self, timeout=None) on the line after the final else:
>
>     def get(self, timeout=None):
>         self.wait(timeout)
>         if not self._ready:
>             raise TimeoutError
>         if self._success:
>             return self._value
>         else:
>             raise self._value
>
>
> Python v 2.7.3, from Python.org, running on Mac OS-X 10.8.3

This looks to me like a bug in multiprocessing but I'm not very
experienced with it. Perhaps it would be good to open an issue on the
tracker. It might not be solvable without an easier way of reproducing
it though.


Oscar



More information about the Python-list mailing list