[IPython-dev] Parallel computing segfault behavior

Patrick Fuller patrickfuller at gmail.com
Tue Jan 28 20:04:05 EST 2014


...the difference being that this would require starting a new engine on
each segfault

On Tuesday, January 28, 2014, Patrick Fuller <patrickfuller at gmail.com>
wrote:

> I guess my question is more along the lines of: should the cluster
> continue on to complete the queued jobs (as it would if the segfaults were
> instead python exceptions)?
>
> On Tuesday, January 28, 2014, MinRK <benjaminrk at gmail.com<javascript:_e({}, 'cvml', 'benjaminrk at gmail.com');>>
> wrote:
>
>> I get an EngineError when an engine dies running a task:
>>
>> http://nbviewer.ipython.org/gist/minrk/8679553
>>
>> I think this is the desired behavior.
>>
>>
>> On Tue, Jan 28, 2014 at 2:18 PM, Patrick Fuller <patrickfuller at gmail.com>wrote:
>>
>>> Hi,
>>>
>>> Has there been any discussion around how ipython parallel handles
>>> segfaulting?
>>>
>>> To make this question more specific, the following code will cause some
>>> workers to crash. All results will become unreadable (or at least
>>> un-iterable), and future runs require a restart of the cluster. Is this
>>> behavior intended, or is it just something that hasn't been discussed?
>>>
>>> from IPython.parallel import Clientfrom random import random
>>> def segfaulty_function(random_number, chance=0.25):
>>>     if random_number < chance:
>>>         import ctypes
>>>         i = ctypes.c_char('a')
>>>         j = ctypes.pointer(i)
>>>         c = 0
>>>         while True:
>>>             j[c] = 'a'
>>>             c += 1
>>>         return j
>>>     else:
>>>         return random_number
>>>
>>> view = Client(profile="something-parallel-here").load_balanced_view()
>>> results = view.map(segfaulty_function, [random() for _ in range(100)])
>>> for i, result in enumerate(results):
>>>     print i, result
>>>
>>> Backstory: Recently I've been working with a large monte carlo library
>>> that segfaults for, like, no reason at all. It's due to some weird
>>> underlying random number issue and happens once every 5-10 thousand runs. I
>>> currently have each worker spin out a child process to isolate the
>>> occasional segfault, but this seems excessive. (I'm also trying to fix the
>>> source of the segfaults, but debugging is a slow process.)
>>>
>>> Thanks,
>>> Pat
>>>
>>> _______________________________________________
>>> IPython-dev mailing list
>>> IPython-dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140128/b2eff95f/attachment.html>


More information about the IPython-dev mailing list