[IPython-dev] pyspark and IPython

Fernando Perez fperez.net at gmail.com
Fri Aug 30 12:11:38 EDT 2013


Hey guys,

On Thu, Aug 29, 2013 at 2:58 PM, Brian Granger <ellisonbg at gmail.com> wrote:
> Sorry I wasn't clear in my question.  I am very aware of how amazing
> Spark and Shark are.  I do think you are right that they are looking
> very promising right now.  What I don't see is what IPython can offer
> in working with them.  Given their architecture, I don't see how for
> example you could run spark jobs from the IPython Notebook
> interactively.  Is that the type of thing you are thinking about?  Or

Sorry, I'm at ampcamp again and can't reply in as much detail as I'd
like, but the pyspark architecture indeed can be used interactively
from the notebook, and in fact it works much, much better than their
default shell.  Here's my quick port of the pyspark tutorial:

http://nbviewer.ipython.org/6384491/Data%20Exploration%20Using%20Spark.ipynb

which I ran yesterday on an AMP cluster that had been configured
according to my little tutorial:

http://nbviewer.ipython.org/6384491/IPythonNotebookPySparkHowTo.ipynb


I'm already talking to the AMPLab folks on how to make this
integration work seamlessly out of the box with their AMIs, it should
be absolutely trivial to do once we have a couple of hours to spend on
it.

Once we ship 1.1 (so the super() bug is fixed and we don't have to go
patching things manually), I'll sit down with them and finish this up.

The deeper question of ipython.parallel/spark
integration/competition/complementarity is much harder to answer, and
I'm not really sure what the answer is yet, to be honest.  A good part
of the reason I'm here is precisely to think about that.


Cheers,

f



More information about the IPython-dev mailing list