[IPython-dev] Parallel computing segfault behavior

Patrick Fuller patrickfuller at gmail.com
Tue Jan 28 20:01:30 EST 2014


I guess my question is more along the lines of: should the cluster continue
on to complete the queued jobs (as it would if the segfaults were instead
python exceptions)?

On Tuesday, January 28, 2014, MinRK <benjaminrk at gmail.com> wrote:

> I get an EngineError when an engine dies running a task:
>
> http://nbviewer.ipython.org/gist/minrk/8679553
>
> I think this is the desired behavior.
>
>
> On Tue, Jan 28, 2014 at 2:18 PM, Patrick Fuller <patrickfuller at gmail.com<javascript:_e({}, 'cvml', 'patrickfuller at gmail.com');>
> > wrote:
>
>> Hi,
>>
>> Has there been any discussion around how ipython parallel handles
>> segfaulting?
>>
>> To make this question more specific, the following code will cause some
>> workers to crash. All results will become unreadable (or at least
>> un-iterable), and future runs require a restart of the cluster. Is this
>> behavior intended, or is it just something that hasn't been discussed?
>>
>> from IPython.parallel import Clientfrom random import random
>> def segfaulty_function(random_number, chance=0.25):
>>     if random_number < chance:
>>         import ctypes
>>         i = ctypes.c_char('a')
>>         j = ctypes.pointer(i)
>>         c = 0
>>         while True:
>>             j[c] = 'a'
>>             c += 1
>>         return j
>>     else:
>>         return random_number
>>
>> view = Client(profile="something-parallel-here").load_balanced_view()
>> results = view.map(segfaulty_function, [random() for _ in range(100)])
>> for i, result in enumerate(results):
>>     print i, result
>>
>> Backstory: Recently I've been working with a large monte carlo library
>> that segfaults for, like, no reason at all. It's due to some weird
>> underlying random number issue and happens once every 5-10 thousand runs. I
>> currently have each worker spin out a child process to isolate the
>> occasional segfault, but this seems excessive. (I'm also trying to fix the
>> source of the segfaults, but debugging is a slow process.)
>>
>> Thanks,
>> Pat
>>
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org <javascript:_e({}, 'cvml',
>> 'IPython-dev at scipy.org');>
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140128/871bdcef/attachment.html>


More information about the IPython-dev mailing list