[CentralOH] 2017-09-25 會議 Scribbles 落書/惡文?: Erik Welch Jupyter Lab Dask
jep200404 at columbus.rr.com
jep200404 at columbus.rr.com
Thu Sep 28 16:21:57 EDT 2017
Thanks to Christoph Baker and Pillar for their generous prosperity.
They gave us plenty of pizza, salad, cookies, and beverages.
There was also hummus for the first time.
techelevator
data analysis
has been coming for about a year and a half
likes doing everything in python
terminal size
zach at cohpy:~$ echo $LINES $COLUMNS
26 100
zach at cohpy:~$ grep 'LINES\|COLUMNS' ~/.bashrc
# update the values of LINES and COLUMNS.
export LINES
export COLUMNS
zach at cohpy:~$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['LINES']
'26'
>>> os.environ['COLUMNS']
'100'
>>>
zach at cohpy:~$
xdpyinfo | less
one of last years challenges (the graphical one) had discussion of
portable way of determining screen size in pixels
has background in physics
work in dublin now
python for ai and neat little projects
want something done quickly and don't know how to do it in bash
about 32 folks tonight
python booth at ohio linux festival 2017-09-30
need to have collection of python books at olf
2017-10-30 monthly meeting:
Michael Handler and Leonard
embedded train track sensor
and did real-time analysis
2017-10-14 columbuscodecamp.com
Christoph Baker
folks should apply to pillar job interview process
Erik Welch
https://github.com/eriknw
cytoolz
dask-patternsearch
eqpy
benchtoolz
metafunc
https://twitter.com/eriknwelch
works on pytoolz and cytoolz (faster)
dask https://dask.pydata.org/en/latest/
dask
python3
python2
django
pandas
jupyter notebook
conda
dask
numba
bokeh
anaconda
holoviews
came for the language; stayed for the community
JupyterLab
human-centered interactive
much coolness
has a bias for making things fast
holoviews
jupytercon (was last month in nyc)
https://github.com/OpenGeoscience/geonotebook
A Jupyter notebook extension for geospatial visualization and analysis
nbextensions
ExecuteTime
zenmode
pretty background
more full screen
hides header
code pretty
little hammer icon
hide header
variable inspector
ExecuteTime
automatically shows execution time for each cell
localhost:8888/tree?#nbextensions_configurator
%qtconsole
partially works for me
have to restart kernel after qt window exists
https://jupyter.org/qtconsole/stable/
core jupyter team uses conda to build and share packages
!conda list
!conda install -c conda-forge jupyter jupyter_contrib_nbextensions
jupyter-contrib-nbextensions.readthedocs.io/en/latest/
import seaborn as sns
df = sns.load_dataset('iris')
df # show prettier dataframe
# cool two-dimensional plots
sns.jointplot(df.sepal_length, df.sepal_width)
sns.jointplot(df.sepal_length, df.petal_length)
sns.jointplot(df.sepal_length, df.petal_length, kind='kde')
from __future__ import print_function, division
from sympy import *
init_printing()
ell_min, ell, ell_max = symbols('ell_min,ell,ell_max', integer=True)
summ = summation((2*ell * 1), (ell, ell_min, ell_max))
summ
ell_min
github.com/OpenGeoscience/geonotebook
scroll down for eye candy
jupyterhub
a "thing explainer" overview
EM GeoSci
electromagnetics
https://em.geosci.xyz/index.html
erik likes gnuplot
%gnuplot
https://github.com/has2k1/gnuplot_kernel/blob/master/examples/gnuplot-kernel.ipynb
http://nbviewer.jupyter.org/github/has2k1/gnuplot_kernel/blob/master/examples/gnuplot-kernel.ipynb
https://github.com/Calysto/notebook-extensions
notebook widgets
from ipyleaflet import Map
Map(center=[39.975, -82.998], zoom=11)
from pythreejs import *
Renderer(camera=c, scene=scene, control=[OrbitControls(controlling=c)])
%time funky_func()
%timeit funky_func()
%%time
%%timeit
funky_func()
%%prun -s cumulative -l 5
funky_func()
%load_ext snakeviz
%snakeviz funky_func()
sunburst versus icicle
%load_ext line_profiler
%lprun -f funky_func funky_func()
sunburst versus icicle
%load_ext memory_profiler
%memit?
%memit funky_func
# %mprun -f funky_func funcy_func() # needs to be from a file
github.com/has2k1/gnuplot_kernel/blob/master/examples/gnuplotkernel/ipynb
Unofficial Jupyter Notebook Extensions
https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/
jupyter.org/widgets.html
folium python data
webgl works on his machine now
%time funky_func()
%timeit funky_func()
%%time
%%timeit
funky_func()
%%prun -s cumulative -a 5
funky_func()
%load_ext snakeviz
%snakeviz funky_func()
%load_ext line_profiler
%lprun -f funky_func funky_func()
%load_ext emory_profiler
%memit funky_func()
%memit?
view -> cell toolbar -> edit metadata
tag cells (named cells)
scroll to tag in n seconds
goto('bottom', time=10)
goto('one', then-{'tags': 'two', 'then': {'tags': 'three}})
class GotoException(Exception):
run cell magic (with javascript code)
goto('x', 'y')
can break loop across multiple cells
# another cell
it = range(10)
val = next(it)
# another cell
loop_begin tag
# another cell
try:
val = next(it)
except Exception:
goto('loopend')
else:
goto('loopbegin')
# another cell
loopend tag
goto(tagname)
$ jupyter lab
next generation of jupyter notebook
alpha
new user interface for jupyter notebook
allowing users to arrange multiple jupyter notebooks,
text editors, terminals, output areas, etc.
on a single page with multiple panels and tabs
in one integrated application.
tiled panes
a pane can have multiple tabs
various jupyter notebooks
various python
various terminal windows (bash)
various text editor windows
run code interactively outside of a notebook in a "Code Console"
and connect one to a text file
right click on a markdown file and "open with..." a live markdown viewer
very handy!!!
double click on csv files to view them as a nicely formatted table
drag and drop notebook cells within a notebook or between notebooks
multicursor support (apple)
large rewrite
flexibility and speed
sns.kdeplot?
sns.plotting_context
inspector in another tab: cool!
does not bog down when viewing large amount of data
lazy loading huge table
https://phosphorusjs.github.io/examples/datagrid/
all done over a single connection with ssh forwarding
xonsh (pronounced konsh)
editor -> key map -> vim (or default, emacs, or subline text)
openwith -> markdown preview
live rendering of markdown
how about for rst?
editor language support: only those beginning with a-c fit on screen
can rebind just about anything
---------------------------------------------------------------
dask: parallel python
multiprocessing
concurrent
import dask
import dask.array as da
a = da.random.random((2000, 2000), chunks=1000)
a.visualize()
(a + a.T).vizualize()
(a + (a.T + 1)).vizualize()
dask.visualize(
(a + (a.T + 1)).sum(axis=0),
(a + (a.T + 1)).sum(axis=1),
(a + (a.T + 1)).sum(),
)
dask.compute(
(a + (a.T + 1)).sum(axis=0),
(a + (a.T + 1)).sum(axis=1),
(a + (a.T + 1)).sum(),
)
from dask.diagnostics import Profiler, ProgressBar
ProgressBar().register()
with Profiler() as prof:
dask.compute(
(a + (a.T + 1)).sum(axis=0),
(a + (a.T + 1)).sum(axis=1),
(a + (a.T + 1)).sum(),
)
prof.visualize()
bokeh is a plotting library
bokeh is pronounced like bo-kay
wp:Bokeh
wp: prefix means Wikipedia
To get good answers, consider following the advice in the links below.
http://catb.org/~esr/faqs/smart-questions.html
http://web.archive.org/web/20090627155454/www.greenend.org.uk/rjk/2000/06/14/quoting.html
bokeh.pydata.org/en/latest/
from dask.distributed import Client
client = Client()
client
compare to celery with rabbitmq
127.0.0.1:88787/profile
dask stuff
ref = dask.compute(
(a + (a.T + 1)).sum(axis=0),
(a + (a.T + 1)).sum(axis=1),
(a + (a.T + 1)).sum()
)
MacBook Pro (Retina, 15-inch, Mid 2014)
2.5 GHz intel core i7
16GB 1600MHz DDR3
nvidia geforce gt 750M 2048 MB
intel iris pro 1536 MB
from concurrent import futures
ex = futures.ThreadPoolExecturo()
import time
def slow_inc(x):
time.sleep(1)
return x + 1
future = ex.submit(slow_inc, 1)
future.result()
results = ex.map(slow_inc, range(100))
# rvs = [x.result() for x in results]
results = ex.map(slow_inc, range(100))
list(results)
f = client.submit(slow_inc, 1)
f.done()
g = client.submit(slow_inc, f)
g
g.result()
client.map
python is growing
https://stackoverflow.blog/2017/09/06/incredible-growth-python/
https://stackoverflow.blog/2017/09/14/python-growing-quickly/
python's scientific stack
astropy
boipython
dipy
nipy
sunpy
scikit learn
statsmodel
sympy
networkx
scikit-image
pymc3
xarray
bokeh
matplotlib
pandas
scipy
dask
ipython
numpy
jupyter
cython
numba
python
python still limited by gil and harder to scale
limited to a single thread
limited to in-memory data
dask
designed to parallelize the python ecosystem
handles complex algorithms
co-developed with pandas/sklearn/juptyer teams
familiar apis for python users
scales
scales from multicore to 1000-node clusters
resilience, responsive, and real-time
parallelizes numpy, pandas, sklearn
satisfies subset of these apis
uses these libraries internally
co-developed with these teams
task scheduler supports custom algorithms
parallelize existing code
build novel real-time systems
arbitrary task graphs with data dependencies
same scalability
demo
high level: scaling pandas
same pandas look and feel
uses pandas under the hood
scales nicely onto many machines
low level: arbitrary task scheduling
parallelize normal python code
build custom algorithms
react real-time
demo developed with
dask-kubernetes
google compute engine
github.com/dask/dask-kubernetes
Standard Dask Demo
https://youtube.com/watch?v=ods97a5Pzw0
why do people choose dask
familiar with python
drop-in numpy/pandas/sklearn apis
native memory environment
easy debugging and diagnostics
have complex problems
parallelize existing code without extension rewrites
sophisticated algorithms and systems
real-time response to small-daa
scales up and down
scales to 1000-node clusters
also runs cheaply on a laptop
import pandas as pd
import dask.dataframe as dd
http://thefaradayproject.com/
continuum analytics changed their name to anaconda
negative decimal (radix is -10)
More information about the CentralOH
mailing list