[IPython-dev] Named Engines

Sun Jul 25 17:51:51 EDT 2010

On Wed, Jul 21, 2010 at 1:58 PM, MinRK <benjaminrk at gmail.com> wrote:

>
>
> On Wed, Jul 21, 2010 at 12:17, Brian Granger <ellisonbg at gmail.com> wrote:
>
>> On Wed, Jul 21, 2010 at 10:51 AM, MinRK <benjaminrk at gmail.com> wrote:
>> >
>> >
>> > On Wed, Jul 21, 2010 at 10:07, Brian Granger <ellisonbg at gmail.com>
>> wrote:
>> >>
>> >> On Wed, Jul 21, 2010 at 2:35 AM, MinRK <benjaminrk at gmail.com> wrote:
>> >> > I now have my MonitoredQueue object on git, which is the three socket
>> >> > Queue
>> >> > device that can be the core of the lightweight ME and Task models
>> >> > (depending
>> >> > on whether it is XREP on both sides for ME, or XREP/XREQ for load
>> >> > balanced
>> >> > tasks).
>> >>
>> >> This sounds very cool.  What repos is this in?
>> >
>> > all on my pyzmq master: github.com/minrk/pyzmq
>> > The Devices are specified in the growing _zmq.pyx. Should I move them?
>>  I
>> > don't have enough Cython experience (this is my first nontrivial Cython
>> > work) to know how to correctly move it to a new file still with all the
>> > right zmq imports.
>>
>> Yes, I think we do want to move them.  We should look at how mpi4py
>> splits things up.  My guess is that we want to have the declaration of
>> the 0MQ C API in a single file that other files can use.  Then have
>> files for the individual things like Socket, Message, Poller, Device,
>> etc.  That will make the code base much easier to work with.  But
>> splitting things like this in Cython is a bit suble.  I have done it
>> before, but I will ask Lisandro Dalcin the best way to approach it.
>> For now, I would keep going with the single file approach (unless you
>> want to learn about how to split things using pxi and pxd files).
>>
>
> I'd be happy to help split it up if you find out the best way to go about
> it.
>
>

OK, I a a bit behind on things from being sick, but I may look into this
when I review+merge you branch.

>
>> >>
>> >> > The biggest difference in terms of design between Python in the
>> >> > Controller
>> >> > picking the destination and this new device is that the client code
>> >> > actually
>> >> > needs to know the XREQ identity of each engine, and all the switching
>> >> > logic
>> >> > lives in the client code (if not the user exposed code) instead of
>> the
>> >> > controller - if the client says 'do x in [1,2,3]' they actually issue
>> 3
>> >> > sends, unlike before, when they issued 1 and the controller issued 3.
>> >> > This
>> >> > will increase traffic between the client and the controller, but
>> >> > dramatically reduce work done in the controller.
>> >>
>> >> But because 0MQ has such low latency it might be a win.  Each request
>> >> the controller gets will be smaller and easier to handle.  The idea of
>> >> allowing clients to specify the names is something I have thought
>> >> about before.  One question though:  what does 0MQ do when you try to
>> >> send on an XREP socket to an identity that doesn't exist?  Will the
>> >> client be able to know that the client wasn't there?  That seems like
>> >> an important failure case.
>> >
>> > As far as I can tell, the XREP socket sends messages out to XREQ ids,
>> and
>> > trusts that such an XREQ exists. If no such id is connected, the message
>> is
>> > silently lost to the aether.  However, with the controller monitoring
>> the
>> > queue, it knows when you have sent a message to an engine that is not
>> > _registered_, and can tell you about it. This should be sufficient,
>> since
>> > presumably all the connected XREQ sockets should be registered engines.
>>
>> I guess I don't quite see how the monitoring is used yet, but it does
>> worry me that the message is silently lost.  So you think 0MQ should
>> raise on that?  I have a feeling that the identies were designed to be
>> a private API thing in 0MQ and we are challenging that.
>>
>
> I don't know what 0MQ should do, but I imagine the silent loss is based on
> thinking of XREP messages as always being replies. That way, a reply sent to
> a nonexistent key is interpreted as being a reply to a message whose
> requester is gone, and 0MQ presumes that nobody else would be interested in
> the result, and drops it. As far as 0MQ is concerned, you wouldn't want the
> following to happen:
> A makes a request of B
> A dies
> B replies to A
> B crashes because A didn't receive the reply
>
> nothing went wrong in B, so it shouldn't crash.
>
>  For us, the XREP messages are not replies on the engine side (they are
> replies on the client side). We are using the identities to treat the
> engine-facing XREP as a keyed multiplexer. The result is that if you send a
> message to nobody, nobody receives it. It's not that nobody knows about it -
> the controller can tell, because it sees every message as it goes by, and
> knows what the valid keys are, but the send itself will not fail.  In the
> client code, you can easily check if a key is valid with the controller, so
> I don't see a problem here.
>
>
OK

> The only source of a problem I can think of comes from the fact that the
> client has a copy of the registration table, and presumably doesn't want to
> update it every time.  That way, an engine could go away between the
> client's updates of the registration, and some requests would vanish.  Note
> that the controller still does receive them, and the client can check with
> the controller on the status of requests that are taking too long.  The
> controller can use a PUB socket to notify of engines coming/going, which
> would mean the window for the client to not be up to date would be very
> small, and it wouldn't even be a big problem if it happend, since the client
> would be notified that its request won't be received.
>

I think this approach makes sense.  At some level the same issue exists
today for us in the twisted version.  If you do mec.get_ids(), that
information could become stale at any moment in time.  I think this is a
intrinsic limitation of the multiengine approach (MPI included).

Cheers,

Brian

>
>
>>
>> > To test:
>> > a = ctx.socket(zmq.XREP)
>> > a.bind('tcp://127.0.0.1:1234')
>> > b = ctx.socket(zmq.XREQ)
>> > b.setsockopt(zmq.IDENTITY, 'hello')
>> > a.send_multipart(['hello', 'mr. b'])
>> > time.sleep(.2)
>> > b.connect('tcp://127.0.0.1:1234')
>> > a.send_multipart(['hello', 'again'])
>> > b.recv()
>> > # 'again'
>> >
>> >>
>> >> > Since the engines' XREP IDs are known at the client level, and these
>> are
>> >> > roughly any string, it brings up the question: should we have
>> strictly
>> >> > integer ID engines, or should we allow engines to have names, like
>> >> > 'franklin1', corresponding directly to their XREP identity?
>> >>
>> >> The idea of having names is pretty cool.  Maybe default to numbers,
>> >> but allow named prefixes as well as raw names?
>> >
>> >
>> > This part is purely up to our user-facing side of the client code. It
>> > certainly doesn't affect how anything works inside. It's just a question
>> of
>> > what a valid `targets' argument (or key for a dictionary interface)
>> would be
>> > in the multiengine.
>>
>> Any string or list of strings?
>>
>
> Well, for now targets is any int or list of ints. I don't see any reason
> that you couldn't use a string anywhere an int would be used. It's perfectly
> unambiguous, since the two key sets are of a different type.
>
> you could do:
> execute('a=5', targets=[0,1,'odin', 'franklin474'])
> and the _build_targets method does:
>
> target_idents = []
> for t in targets:
>     if isinstance(t, int):
>         ident = identities[t]
>     if isinstance(t, str) and t in identities.itervalues():
>         ident = t
>     else:
>         raise KeyError("bad target: %s"%t)
>     target_idents.append(t)
> return target_idents
>
>
>
>> >>
>> >> > I think people might like using names, but I imagine it could get
>> >> > confusing.
>> >> >  It would be unambiguous in code, since we use integer IDs and XREP
>> >> > identities must be strings, so if someone keys on a string it must be
>> >> > the
>> >> > XREP id, and if they key on a number it must be by engine ID.
>> >>
>> >> Right.  I will have a look at the code.
>> >>
>> >> Cheers,
>> >>
>> >> Brian
>> >>
>> >> > -MinRK
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Brian E. Granger, Ph.D.
>> >> Assistant Professor of Physics
>> >> Cal Poly State University, San Luis Obispo
>> >> bgranger at calpoly.edu
>> >> ellisonbg at gmail.com
>> >
>> >
>>
>>
>>
>> --
>> Brian E. Granger, Ph.D.
>> Assistant Professor of Physics
>> Cal Poly State University, San Luis Obispo
>> bgranger at calpoly.edu
>> ellisonbg at gmail.com
>>
>
>

-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20100725/45dbc294/attachment.html>