[IPython-dev] Ipython1 architecture...

Fri Apr 11 13:23:46 EDT 2008

> I'm starting to dig through the Ipython1 code... first thoughts are
>  there's been some serious thinking done here and I'm impressed.  I have
>  a lot of ideas driven by past-life experiences but tempered by
>  recent-life reality and am interested in beginning a dialog on very high
>  level architectural approaches.

Great!

>  In particular, as I'm sure is very clear to all, its very important that
>  we keep things as orthogonal as possible.  For example, I'm very
>  interested in the general needs of parallel computing but also require
>  elements of more conventional peer-to-peer and client-server
>  architecture.  From what I see, there shouldn't be any problems in that
>  regard.

Do you mean that the parallel computing stuff is or should be
orthogonal to p2p/client/server stuff?

>  I'm guessing that there are principles relating to non-generic resources
>  (e.g. live data sources or heterogeneous compute resources) which might
>  need to be addressed... this might be a simple matter of labeling
>  convention and a couplea data structures... but some discussion might be
>  fruitful.
>
>  But, the first thing I came across in digging through the code was
>  reference to threads on the client side.  My concern is basically that
>  virtually all the code I write avoids threads.  Threads are fine, of
>  course, but are often the single biggest source of difficulty in a
>  project for reasons I don't need to go into.
>
>  So, I'm interested in how and why threading is used in the client side
>  of Ipython1.  Is it simply to provide support for non-blocking gui
>  activity (like using -q4thread on the ipython command line) or is there
>  some other reason.

Ah yes, glad you asked.  From my perspective, threads are used in two
broad ways:

1) As a model of parallelism/concurrency.  From this perspective,
threads are a way of handling concurrency in a shared memory context.
I personally think that threads are _not_ a good solution in this
arena.  They are simply too low level and difficult to get right.

2) As an implementation details for building other models of
parallelism/concurrency.  For example, Erlang uses an actor based,
shared nothing, message passing approach to concurreny.  But,
underneath the hood Erlang uses threads to implement their runtime
system.  I am all for this type of usage of threads.

In my mind, our usage of threads in the ipython1 falls into category
2).  We are not advocating that users know or care about threads.
But, we do need them in our implementation.  Here is why:

Long ago, when we first started ipython1, we did not use Twisted.  In
fact, we used plain old blocking sockets for our client.  The reason
is that we have always wanted an API that your average scientists
could understand.  And in our experience (we are scientists),
scientists do not think in terms of event loops or deferreds.  They
think in very blocking terms.  Thus, we have always felt like our
design goals require a blocking (possibly polling) user-facing
interface.

Eventually, we discovered Twisted and wanted to use Twisted for our
client networking code.  But, we have a particularly odd set of
constraints:

1) We need our client code to run on normal python/ipython sessions
that re not event driven.

2) We want to use Twisted.

3) We need to have a truly blocking interface.

So, we asked on the Twisted lists, "how can we block on a deferred?"
There was much laughter and blanket statements like "you don't want to
do that" and "you can't do that."  In spite of those statements we
knew that in fact we did want to do that.  So we spent time playing
around with various approaches.  The end of these explorations was
this conclusion:

If you only have one thread, it is true, "you can't block on a
deferred."  But also began to see that if there are multiple threads
around, you could do it.  Then came along blockingCallFromThread, that
does exactly that:  it allows blocking on a deferred in another
thread.  Another option is coroutines, and we haven't explored that
yet.  Coroutines are attractive, but only if they don't require a
custom python version (like stackless).

>  Given twisted, I'd posit that its not really necessary to use threads to
>  support non-blocking command line behavior or anything else on the
>  client.  The reactor (or select in general) can get around this... but
>  might require a rewrite of pyreadline which is planned but would take
>  time... which is why the threading is used...

Ironically, writing non-blocking interfaces is relatively easy using
threads or twisted.  The challenge is putting a blocking layer on top
of a non-blocking one.  I should mention that we do have two types of
clients:

1) asynclient:  these clients don't use threads.  Rather they use
twisted and return deferreds.  These are very new and most people
don't know about them.  I really like them.

2) client: these are a thin wrapper around the asynclient clients that
use blockingCallFromThread to block.

>  I'd probably go farther and say that threading might really only have a
>  proper place in the architecture for compute bound problems.  Given that
>  this is a parallel computing activity, that would make sense, but it
>  would really need to be thought through (and it looks like it might very
>  have been very carefully thought through).  Fortunately, twisted
>  provides very nice integration between the "main" thread and compute
>  threads, so I'm sure that all is well... but...

I still think threads are useful in implementing higher level
concurrency constructs - even though I agree with you that i don't
think they are a good solution many other things.

One thing that is really nice about twisted is that it does play well
with threads.  Our entire model is really message based (shared
nothing) and if python didn't have the GIL, we could do some very cool
single process things.

>  On the client side, it appears as though the reactor itself is spawned
>  in a thread.  Given that my gui code will be entirely reactor driven,
>  I'm probably fine... but how is the thread coordination planned?  From
>  previous posts its clear that the use of the new twisted blocking thread
>  stuff is heavily used (can't recall the proper names right now).  Again,
>  great... but it would be good to know what the intent is and, more
>  importantly, if I'm gonna get bit.

The interaction between the twisted reactors thread and GUI threads is
something we do need to think more about.  We would love help when we
get around to that.  We need to make sure that people don't get bit
because they choose to use our stuff.

>  >From a more researchy / advanced / cool stuff perspective, I'd be
>  interested in any thinking thats been done WRT coroutines and the wild
>  stuff that can go on in that space.  Erlang (and I'm no Erlang expert)
>  has a very thorough architecture which provides all kinds of fanciness
>  including the persisting / migration of tasklets and levels of
>  architectural constructs answering questions I haven't yet asked (not
>  unlike twisted).  There's been some sniffing around by the stackless
>  folks to see what the right approach to nailing stackless and twisted
>  might be.  I see some of the concepts proposed for Ipython1 potentially
>  fitting nicely with things like tasklet migration.  Of course, there are
>  greenlets etc.

Coroutines:  I would love to see a demonstration of using coroutines
and twisted, but using stackless or another custom python version is
not an option.  Another option is greenlet and corotwine:

http://codespeak.net/py/dist/greenlet.html
https://launchpad.net/corotwine/

These run with the standard Cpython.  I haven't played with them, but
would love it if someone wanted to scope this out and and report back.

Erlang:  I _really_ like Erlang.  I have been spending some of my free
time messing with Erlang and as far as I am concerned it is the best
language for concurrent/distributed programming.  What do I mean by
this.  It is what I would like Twisted to be.  Twisted is fantastic,
but I really need single process twisted apps to scale to multiple
cores.  An erlang app (at least in principle) can scale across both
threads and processes and that is really amazing.  I should also say
that I am amazed at how far Twisted goes in this direction.  It is
quite amazing!

But, I should say, I think Python is the best language for scientific,
mathematical and numerical computing.  Thus, I don't ever see a day
when I would recommend that a scientist use erlang to solve their
favorite diff-eq.

But, the combination of Erlang and Python is very interesting.  Here
is what I have been thinking.  There is a not a Twisted implementation
of the Erlang node protocol:

https://launchpad.net/twotp

I have played around with this and it is very encouraging.  More work
needs to be done, but there is a great foundation.  This lets
Python+Twisted processes talk seamlessly to Erlang nodes.  You can
imagine the possibilities.  My basic vision would be to have an
implementation of ipython1, where the controller is an erlang node,
and the engines/clients are python.  The advantages would be:

1) Better performance in terms of the networking stuff.

2) The controller is currently a huge bottleneck for us.  It simple
has to do to much.  If we used erlang, it would be much easier to
scale the controller itself to multiple cores or even multiple hosts.

3) The fault tolerance of the controller would be easier to address
(OTP,Mnesia, etc.)

Downside of using Erlang:

1) Users would have to install Erlang and Python

Bottom line:  there are lots of really interesting directions to go
in.  We just need good people with vision and ideas.

Cheers,

Brian

>  and its on the above that orthogonality need be maintained.
>
>  I could digress into domain specific nastiness... but will spare the
>  larger group... I guess that I hope for some increased radiation on the
>  Ipython1 front.
>
>  -glenn
>
>  --
>  Glenn H. Tarbox, PhD    | Don't worry about people stealing your ideas. If your ideas
>  206-494-0819            | are any good, you'll have to ram them down people's throats
>  glenn at tarbox.org (gtalk) + ghtdak on aim/freenode
>
>  _______________________________________________
>  IPython-dev mailing list
>  IPython-dev at scipy.org
>  http://lists.ipython.scipy.org/mailman/listinfo/ipython-dev
>