[CentralOH] 2017-09-25 會議 Scribbles 落書/惡文?: Erik Welch Jupyter Lab Dask

jep200404 at columbus.rr.com jep200404 at columbus.rr.com
Thu Sep 28 16:21:57 EDT 2017


Thanks to Christoph Baker and Pillar for their generous prosperity.
They gave us plenty of pizza, salad, cookies, and beverages.
There was also hummus for the first time.

techelevator

data analysis

has been coming for about a year and a half
likes doing everything in python

terminal size

    zach at cohpy:~$ echo $LINES $COLUMNS
    26 100
    zach at cohpy:~$ grep 'LINES\|COLUMNS' ~/.bashrc
    # update the values of LINES and COLUMNS.
    export LINES
    export COLUMNS
    zach at cohpy:~$ python3
    Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> os.environ['LINES']
    '26'
    >>> os.environ['COLUMNS']
    '100'
    >>> 
    zach at cohpy:~$ 

    xdpyinfo | less

one of last years challenges (the graphical one) had discussion of
portable way of determining screen size in pixels

has background in physics
work in dublin now
python for ai and neat little projects
want something done quickly and don't know how to do it in bash

about 32 folks tonight

python booth at ohio linux festival 2017-09-30

    need to have collection of python books at olf

2017-10-30 monthly meeting:

    Michael Handler and Leonard
    embedded train track sensor
    and did real-time analysis

2017-10-14 columbuscodecamp.com

Christoph Baker
folks should apply to pillar job interview process

Erik Welch

https://github.com/eriknw
    cytoolz
    dask-patternsearch
    eqpy
    benchtoolz
    metafunc
https://twitter.com/eriknwelch
works on pytoolz and cytoolz (faster)
dask https://dask.pydata.org/en/latest/

dask
python3
python2
django
pandas
jupyter notebook

conda
dask
numba
bokeh
anaconda
holoviews

came for the language; stayed for the community

JupyterLab

    human-centered interactive 
    much coolness

has a bias for making things fast

holoviews

jupytercon (was last month in nyc)
https://github.com/OpenGeoscience/geonotebook
    A Jupyter notebook extension for geospatial visualization and analysis

nbextensions

    ExecuteTime
    zenmode
        pretty background
        more full screen
        hides header
    code pretty
        little hammer icon
    hide header
    variable inspector
    ExecuteTime
        automatically shows execution time for each cell
    localhost:8888/tree?#nbextensions_configurator

%qtconsole
    partially works for me
    have to restart kernel after qt window exists

    https://jupyter.org/qtconsole/stable/

core jupyter team uses conda to build and share packages

!conda list
!conda install -c conda-forge jupyter jupyter_contrib_nbextensions

jupyter-contrib-nbextensions.readthedocs.io/en/latest/

import seaborn as sns
df = sns.load_dataset('iris')
df  # show prettier dataframe

# cool two-dimensional plots
sns.jointplot(df.sepal_length, df.sepal_width)
sns.jointplot(df.sepal_length, df.petal_length)
sns.jointplot(df.sepal_length, df.petal_length, kind='kde')

from __future__ import print_function, division
from sympy import *
init_printing()
ell_min, ell, ell_max = symbols('ell_min,ell,ell_max', integer=True)
summ = summation((2*ell * 1), (ell, ell_min, ell_max))
summ
ell_min

github.com/OpenGeoscience/geonotebook
    scroll down for eye candy

jupyterhub
    a "thing explainer" overview

EM GeoSci
    electromagnetics
https://em.geosci.xyz/index.html

erik likes gnuplot
%gnuplot
https://github.com/has2k1/gnuplot_kernel/blob/master/examples/gnuplot-kernel.ipynb
http://nbviewer.jupyter.org/github/has2k1/gnuplot_kernel/blob/master/examples/gnuplot-kernel.ipynb

https://github.com/Calysto/notebook-extensions
    notebook widgets

from ipyleaflet import Map

Map(center=[39.975, -82.998], zoom=11)

from pythreejs import *
Renderer(camera=c, scene=scene, control=[OrbitControls(controlling=c)])

%time funky_func()

%timeit funky_func()

%%time
%%timeit
funky_func()

%%prun -s cumulative -l 5
funky_func()

%load_ext snakeviz

%snakeviz funky_func()

    sunburst versus icicle

%load_ext line_profiler

%lprun -f funky_func funky_func()

    sunburst versus icicle

%load_ext memory_profiler

%memit?

%memit funky_func

# %mprun -f funky_func funcy_func()  # needs to be from a file

github.com/has2k1/gnuplot_kernel/blob/master/examples/gnuplotkernel/ipynb
Unofficial Jupyter Notebook Extensions
https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/

jupyter.org/widgets.html

folium python data

webgl works on his machine now

%time funky_func()
%timeit funky_func()
%%time
%%timeit
funky_func()

%%prun -s cumulative -a 5
funky_func()

%load_ext snakeviz

%snakeviz funky_func()

%load_ext line_profiler

%lprun -f funky_func funky_func()

%load_ext emory_profiler

%memit funky_func()

%memit?

view -> cell toolbar -> edit metadata

tag cells (named cells)

scroll to tag in n seconds
goto('bottom', time=10)

goto('one', then-{'tags': 'two', 'then': {'tags': 'three}})

class GotoException(Exception):

run cell magic (with javascript code)

goto('x', 'y')

can break loop across multiple cells

    # another cell
    it = range(10)
    val = next(it)

    # another cell
    loop_begin tag

    # another cell
    try:
        val = next(it)
    except Exception:
        goto('loopend')
    else:
        goto('loopbegin')

    # another cell
    loopend tag

goto(tagname)

$ jupyter lab

next generation of jupyter notebook
    alpha

new user interface for jupyter notebook
allowing users to arrange multiple jupyter notebooks,
text editors, terminals, output areas, etc.
on a single page with multiple panels and tabs
in one integrated application.

    tiled panes
    a pane can have multiple tabs

various jupyter notebooks
various python 
various terminal windows (bash)
various text editor windows

run code interactively outside of a notebook in a "Code Console"
and connect one to a text file

right click on a markdown file and "open with..." a live markdown viewer
    very handy!!!

double click on csv files to view them as a nicely formatted table

drag and drop notebook cells within a notebook or between notebooks

multicursor support (apple)

large rewrite

    flexibility and speed

sns.kdeplot?
sns.plotting_context

inspector in another tab: cool!

does not bog down when viewing large amount of data

    lazy loading huge table
    https://phosphorusjs.github.io/examples/datagrid/

all done over a single connection with ssh forwarding

xonsh (pronounced konsh)

editor -> key map -> vim (or default, emacs, or subline text)

openwith -> markdown preview

live rendering of markdown
how about for rst?

editor language support: only those beginning with a-c fit on screen

can rebind just about anything

---------------------------------------------------------------

dask: parallel python

multiprocessing
concurrent

import dask
import dask.array as da

a = da.random.random((2000, 2000), chunks=1000)

a.visualize()
(a + a.T).vizualize()
(a + (a.T + 1)).vizualize()
dask.visualize(
    (a + (a.T + 1)).sum(axis=0),
    (a + (a.T + 1)).sum(axis=1),
    (a + (a.T + 1)).sum(),
)
dask.compute(
    (a + (a.T + 1)).sum(axis=0),
    (a + (a.T + 1)).sum(axis=1),
    (a + (a.T + 1)).sum(),
)

from dask.diagnostics import Profiler, ProgressBar

ProgressBar().register()

with Profiler() as prof:
    dask.compute(
        (a + (a.T + 1)).sum(axis=0),
        (a + (a.T + 1)).sum(axis=1),
        (a + (a.T + 1)).sum(),
    )

prof.visualize()

bokeh is a plotting library
bokeh is pronounced like bo-kay
wp:Bokeh

wp: prefix means Wikipedia
To get good answers, consider following the advice in the links below.
http://catb.org/~esr/faqs/smart-questions.html
http://web.archive.org/web/20090627155454/www.greenend.org.uk/rjk/2000/06/14/quoting.html

bokeh.pydata.org/en/latest/

from dask.distributed import Client
client = Client()
client

    compare to celery with rabbitmq

127.0.0.1:88787/profile
    dask stuff

ref = dask.compute(
    (a + (a.T + 1)).sum(axis=0),
    (a + (a.T + 1)).sum(axis=1),
    (a + (a.T + 1)).sum()
)

MacBook Pro (Retina, 15-inch, Mid 2014)
2.5 GHz intel core i7
16GB 1600MHz DDR3
nvidia geforce gt 750M 2048 MB
intel iris pro 1536 MB

from concurrent import futures

ex = futures.ThreadPoolExecturo()

import time
def slow_inc(x):
    time.sleep(1)
    return x + 1

future = ex.submit(slow_inc, 1)
future.result()
results = ex.map(slow_inc, range(100))
# rvs = [x.result() for x in results]
results = ex.map(slow_inc, range(100))
list(results)
f = client.submit(slow_inc, 1)
f.done()
g = client.submit(slow_inc, f)
g
g.result()
client.map

python is growing

    https://stackoverflow.blog/2017/09/06/incredible-growth-python/
    https://stackoverflow.blog/2017/09/14/python-growing-quickly/

python's scientific stack

    astropy
    boipython
    dipy
    nipy
    sunpy
    scikit learn
    statsmodel
    sympy
    networkx
    scikit-image
    pymc3
    xarray
    bokeh
    matplotlib
    pandas
    scipy
    dask
    ipython
    numpy
    jupyter
    cython
    numba
    python

python still limited by gil and harder to scale

    limited to a single thread
    limited to in-memory data

dask

    designed to parallelize the python ecosystem
        handles complex algorithms
        co-developed with pandas/sklearn/juptyer teams
        familiar apis for python users
    scales
        scales from multicore to 1000-node clusters
        resilience, responsive, and real-time

    parallelizes numpy, pandas, sklearn
        satisfies subset of these apis
        uses these libraries internally
        co-developed with these teams

    task scheduler supports custom algorithms
        parallelize existing code
        build novel real-time systems
        arbitrary task graphs with data dependencies
        same scalability

    demo

        high level: scaling pandas
            same pandas look and feel
            uses pandas under the hood
            scales nicely onto many machines
        low level: arbitrary task scheduling
            parallelize normal python code
            build custom algorithms
            react real-time

        demo developed with
            dask-kubernetes
            google compute engine
            github.com/dask/dask-kubernetes

        Standard Dask Demo
        https://youtube.com/watch?v=ods97a5Pzw0

    why do people choose dask

        familiar with python
            drop-in numpy/pandas/sklearn apis
            native memory environment
            easy debugging and diagnostics
        have complex problems
            parallelize existing code without extension rewrites
            sophisticated algorithms and systems
            real-time response to small-daa
        scales up and down
            scales to 1000-node clusters
            also runs cheaply on a laptop

import pandas as pd
import dask.dataframe as dd

http://thefaradayproject.com/

continuum analytics changed their name to anaconda

negative decimal (radix is -10)


More information about the CentralOH mailing list