[CentralOH] 2017-06-26 會議 Scribbles 落書/惡文?: Jeff Klukas Machine Learning scikit-learn; CMS v ALICE
jep200404 at columbus.rr.com
jep200404 at columbus.rr.com
Tue Jun 27 15:12:39 EDT 2017
Thanks again to Pillar, Chris Baker, and Jeff for their generous hospitality.
They gave us free pizza, salad, cookies, and beverages.
We had around 46 people, probably the most at a COhPy meeting ever.
and we ran out of pizza and cookies for the first time.
Jason Green ran the meeting. He's great in front of a crowd.
Machine Learning (using scikit-learn)
by Jeff Klukas, Data Engineer at Simple.
dry run for PyOhio
came from academia
six years at University of Wisconsin
doing experimental particle physics
worked on large hadron collider
CMS detector kit at CERN
wp:Compact Muon Solenoid
(compare with Andrew Kubera's ALICE)
wp: prefix means Wikipedia
To get good answers, consider following the advice in the links below.
http://catb.org/~esr/faqs/smart-questions.html
http://web.archive.org/web/20090627155454/www.greenend.org.uk/rjk/2000/06/14/quoting.html
petabyte is 1000 (or 1024) terabytes
big data is not amenable to machine learning
wp:Simple (bank)
wp:scikit-learn
https://twitter.com/JeffKlukas
https://github.com/jklukas/
http://jeff.klukas.net/
Works remotely for Simple which is based in Portland, Oregon.
Slides
https://www.dropbox.com/s/hhlgb97yjabw1iv/COhPy-Klukas.pdf?dl=0
most slides were viewable from back of row with ambient light on screen
low resolution
good contrast but could be better
black text would be better than gray text
live demo had too much content on screen
should be viewable from back of room
24x80 would have best viewability
scikit-learn built on top of pandas, which is built on top of numpy
hope to classify texts from customers (triage)
especially urgent ones to reduce losses from fraud
split between data scientists and folks who apply found patterns
separate learning from serving
http://scikit-learn.org/stable/
http://scikit-learn.org/stable/tutorial/basic/tutorial.html
supervised learning
classification
lemma is dictionary form of a word e.g. goes -> go
sklearn.pipeline
how relevant would their pipeline syntax be to other stuff in python?
split data into training and testing
overmatch
five data scientists
six data engineers
train on data that is only a couple megabytes
wp:haggis
2017-05-22 addendum
The Secret of Building 42
wp:Joseph Desch
ted talks federal prosecutor from san francisco bitcoin sleuthing
wp:The Door into Summer
wp:Robert A. Heinlein
buy versus build
wp:Office Space
shaving pennies
More information about the CentralOH
mailing list