[IPython-dev] running tests using ipython parallel

Tue Mar 17 15:55:12 EDT 2009

Ondrej,

> I am playing with ipython parallel and it behaves much more robustly
> than the multiprocessing module from python2.6. In the multiprocessing
> module if I get an exception, the process basically gets stuck, ctrl-c
> doesn't help and I need to kill it using the "kill" command.

Yep, we have worked *very* hard to make sure that exceptions at least
get propagated back to the client.  If they don't, it is a bug and
please let us know.

> In ipython it works pretty well. I have couple questions though:
>
> * which approach do you think would be the best to implement parallel
> testing? Basically you have a test suite (nosetest and py.test
> compatible) and currently in the sequential mode I just call
> "execfile", get all functions from the file and execute them. In
> parallel I do something along these lines:
>
>        from IPython.kernel import client
>        mec = client.MultiEngineClient()
>        mec.reset()
>        print "running %d jobs" % self._jobs
>        ids = mec.get_ids()
>        mec.execute("filename = '%s'" % filename)
>        mec.execute("gl = {'__file__': filename}")
>        mec.execute("execfile(filename, gl)")
>        i = 0
>        for f in funcs:
>            if i >= len(ids): i = 0
>            #mec.push({"f": f})
>            mec.push(dict(f=f))
>            #print mec.execute("gl['%s']" % f.__name__, targets=[ids[i]])
>            print mec.execute("f", targets=[ids[i]])
>            i += 1
>
> funcs list contains all the test functions that I can then execute
> using f(). Unfortunately this approach gives me an error at the line
> "mec.push(dict(f=f))":

Is f here an actual function?  If so, you need to use the method
"push_function" instead of push.  The reason for this is that
functions can't be pickled out of the box.  The push_function method
has extra logic that makes it work.

Other tips:

* If the function you are calling don't take very long, the latency in
your current approach will really get you.  A much better way would be
to define a function that could test everything in a packages
hierarchy below a certain point.  That way you could have an engine
test an entire subpackage.  Then the latency will matter less.

* The loop you are writing is basically just doing what the map method does:

mec.map(lambda x: x**2, range(10))

It works just like python's map, but is parallel.  The only difference
is that map takes either a function or a string that can is exec'd.

* If you want to get rid of the code in strings "feature", just define
functions, push the function using push_function and then call the
function using execute.  We probably should also implement something
like this:

mec.call(f, args, **kwargs)

If this would be useful to you, could you file a ticket for this?  In
the ticket, could be mention that we should use a cache to make sure
that functions are only pushed one time?

>  File "/home/ondrej/repos/sympy/sympy/utilities/runtests.py", line
> 214, in test_file
>    mec.push(dict(f=f))
>  File "/var/lib/python-support/python2.6/IPython/kernel/multiengineclient.py",
> line 552, in push
>    targets=targets, block=block)
>  File "/var/lib/python-support/python2.6/IPython/kernel/multiengineclient.py",
> line 441, in _blockFromThread
>    result = blockingCallFromThread(function, *args, **kwargs)
>  File "/var/lib/python-support/python2.6/IPython/kernel/twistedutil.py",
> line 69, in blockingCallFromThread
>    return twisted.internet.threads.blockingCallFromThread(reactor, f, *a, **kw)
>  File "/usr/lib/python2.6/dist-packages/twisted/internet/threads.py",
> line 114, in blockingCallFromThread
>    result.raiseException()
>  File "/usr/lib/python2.6/dist-packages/twisted/python/failure.py",
> line 326, in raiseException
>    raise self.type, self.value, self.tb
> TypeError: expected string or Unicode object, NoneType found
>
>
> So I thought, ok, let's not push things in and execute everything at
> the engines.
>
> * what things could be safely pushed to engines? I know it can push
> some functions, but I didn't manage to get it actually working for the
> actual test functions (it works nice for simple functions from the
> tutorial). So the only option that seems to me that it should work is
> that I first implement a function that executes for "n"th test case
> from the test suite and this function will not be transfered to
> engines. The only thing that will be pushed is one string ("function
> name") and then couple integers to specify all the parameters.

Other than functions, anything that can be pickled can be pushed.  The
only big limitation is that classes need to be importable to be
pushed.  Thus, classes that are defined interactively can't be pushed.

Functions should be pushed using push_function.

But, from the speed perspective, you want to push as little as
possible, so just pushing strings is not a bad idea.  Why don't you
get something working first though and then we can figure out the
performance issues.

One final note:  we are aware of some performance issues in the
current parallel ipython.  These issues mainly affect latency (small
amounts of work done in parallel) and pushing very large objects.  We
are working on these things.

Cheers,

Brian

>
>
> Ondrej
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>