[IPython-dev] ZMQ Parallel IPython Performance preview

Tue Oct 26 02:17:00 EDT 2010

Min,

Thanks for running these benchmarks, comments below...

> Re-run for throughput with data:
>
> submit 16 tasks of a given size, plot against size.
> new-style:
> def echo(a):
>     return a
> old-style:
> task = StringTask("b=a", push=dict(a=a), pull=['b'])
>
>
I really like the style of the new API - echo is exactly what it does!

> The input chosen was random numpy arrays (64b float, so len(A)/8 ~= size in
> B).
>
> Notable points:
> * ZMQ submission remains flat, independent of size, due to non-copying
> sends
>

We hoped that this would be the case, but this is really non-trivial and
good to see.

> * size doesn't come into account until ~100kB, and clearly dominates both
> after 1MB
>     the turning point for Twisted is a little earlier than for ZMQ
> * at 4MB, Twisted is submitting < 2 tasks per sec, while ZMQ is submitting
> ~90
>

This is a fantastic point of comparison.  4 MB is a non-trivial amount of
data, and there is a huge difference between 0.5 second overhead (Twisted)
and 0.01 sec overhead (zmq).  It means that with zmq, users can get a
parallel speedup on calculations that involve much less CPU  cycles per byte
of data sent.

> * roundtrip, ZMQ is fairly consistently ~40x faster.
>
> memory usage:
> * Peak memory for the engines is 20% higher with ZMQ, because more than one
> task can now be waiting in the queue on the engine at a time.
>

Right, but this is good news as it is offloading the data off the controller
faster.

> * Peak memory for the Controller including schedulers is 25% less than
> Twisted with pure ZMQ, and 20% less with the Python scheduler. Note that all
> results still reside in memory, since I haven't implemented the db backend
> yet.
>

I would think that is the biggest memory usage for the controller in the
long run.  But we know how to fix that.

> * Peak memory for the Python scheduler is approximately the same as the
> engines
>

> * Peak memory for the zmq scheduler is about half that.
>
>
all very good news.  I think these plots can definitely make it into a paper
on this.

Cheers,

Brian

> -MinRK
>
> On Fri, Oct 22, 2010 at 09:52, MinRK <benjaminrk at gmail.com> wrote:
>
>> I'll get on the new tests, I already have a bandwidth one written, so I'm
>> running it now.  As for Twisted's throughput performance, it's at least
>> partly our fault.  Since the receiving is in Python, every time we try to
>> send there are incoming results getting in the way.  If we wrote it such
>> that sending prevented the receipt of results, I'm sure the Twisted code
>> would be faster for large numbers of messages.  With ZMQ, though, we don't
>> have to be receiving in Python to get the results to the client process, so
>> they arrive in ZMQ and await simple memcpy/deserialization.
>>
>> -MinRK
>>
>>
>> On Fri, Oct 22, 2010 at 09:27, Brian Granger <ellisonbg at gmail.com> wrote:
>>
>>> Min,
>>>
>>> Also, can you get memory consumption numbers for the controller and
>>> queues.  I want to see how much worse Twisted is in that respect.
>>>
>>> Cheers,
>>>
>>> Brian
>>>
>>> On Thu, Oct 21, 2010 at 11:53 PM, MinRK <benjaminrk at gmail.com> wrote:
>>>
>>>> I have my first performance numbers for throughput with the new parallel
>>>> code riding on ZeroMQ, and results are fairly promising.  Roundtrip time for
>>>> ~512 tiny tasks submitted as fast as they can is ~100x faster than with
>>>> Twisted.
>>>>
>>>> As a throughput test, I submitted a flood of many very small tasks that
>>>> should take ~no time:
>>>> new-style:
>>>> def wait(t=0):
>>>>     import time
>>>>     time.sleep(t)
>>>> submit:
>>>> client.apply(wait, args=(t,))
>>>>
>>>> Twisted:
>>>> task = StringTask("import time; time.sleep(%f)"%t)
>>>> submit:
>>>> client.run(task)
>>>>
>>>> Flooding the queue with these tasks with t=0, and then waiting for the
>>>> results, I tracked two times:
>>>> Sent: the time from the first submit until the last submit returns
>>>> Roundtrip: the time from the first submit to getting the last result
>>>>
>>>> Plotting these times vs number of messages, we see some decent numbers:
>>>> * The pure ZMQ scheduler is fastest, 10-100 times faster than Twisted
>>>> roundtrip
>>>> * The Python scheduler is ~3x slower roundtrip than pure ZMQ, but no
>>>> penalty to the submission rate
>>>> * Twisted performance falls off very quickly as the number of tasks
>>>> grows
>>>> * ZMQ performance is quite flat
>>>>
>>>> Legend:
>>>> zmq: the pure ZMQ Device is used for routing tasks
>>>> lru/weighted: the simplest/most complicated routing schemes respectively
>>>> in the Python ZMQ Scheduler (which supports dependencies)
>>>> twisted: the old IPython.kernel
>>>>
>>>> [image: roundtrip.png]
>>>> [image: sent.png]
>>>> Test system:
>>>> Core-i7 930, 4x2 cores (ht), 4-engine cluster all over tcp/loopback,
>>>> Ubuntu 10.04, Python 2.6.5
>>>>
>>>> -MinRK
>>>> http://github.com/minrk
>>>>
>>>
>>>
>>>
>>> --
>>> Brian E. Granger, Ph.D.
>>> Assistant Professor of Physics
>>> Cal Poly State University, San Luis Obispo
>>> bgranger at calpoly.edu
>>> ellisonbg at gmail.com
>>>
>>
>>
>

-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20101025/c05fe347/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: roundtrip.png
Type: image/png
Size: 30731 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20101025/c05fe347/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sent.png
Type: image/png
Size: 31114 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20101025/c05fe347/attachment-0001.png>