[CentralOH] 2017-06-26 會議 Scribbles 落書/惡文?: Jeff Klukas Machine Learning scikit-learn; CMS v ALICE

jep200404 at columbus.rr.com jep200404 at columbus.rr.com
Tue Jun 27 15:12:39 EDT 2017


Thanks again to Pillar, Chris Baker, and Jeff for their generous hospitality.
They gave us free pizza, salad, cookies, and beverages.
We had around 46 people, probably the most at a COhPy meeting ever.
and we ran out of pizza and cookies for the first time.

Jason Green ran the meeting. He's great in front of a crowd.

Machine Learning (using scikit-learn)
by Jeff Klukas, Data Engineer at Simple.
dry run for PyOhio

came from academia
six years at University of Wisconsin
doing experimental particle physics
worked on large hadron collider
CMS detector kit at CERN
wp:Compact Muon Solenoid
    (compare with Andrew Kubera's ALICE)

wp: prefix means Wikipedia
To get good answers, consider following the advice in the links below.
http://catb.org/~esr/faqs/smart-questions.html
http://web.archive.org/web/20090627155454/www.greenend.org.uk/rjk/2000/06/14/quoting.html

petabyte is 1000 (or 1024) terabytes
big data is not amenable to machine learning

wp:Simple (bank)
wp:scikit-learn
https://twitter.com/JeffKlukas
https://github.com/jklukas/
http://jeff.klukas.net/
Works remotely for Simple which is based in Portland, Oregon.
Slides

    https://www.dropbox.com/s/hhlgb97yjabw1iv/COhPy-Klukas.pdf?dl=0

    most slides were viewable from back of row with ambient light on screen

        low resolution
        good contrast but could be better

            black text would be better than gray text

    live demo had too much content on screen

        should be viewable from back of room
        24x80 would have best viewability

scikit-learn built on top of pandas, which is built on top of numpy
hope to classify texts from customers (triage)
especially urgent ones to reduce losses from fraud

split between data scientists and folks who apply found patterns

separate learning from serving
http://scikit-learn.org/stable/
http://scikit-learn.org/stable/tutorial/basic/tutorial.html

supervised learning
    classification

lemma is dictionary form of a word e.g. goes -> go
sklearn.pipeline
    how relevant would their pipeline syntax be to other stuff in python?

split data into training and testing
overmatch

five data scientists
six data engineers

train on data that is only a couple megabytes

wp:haggis

2017-05-22 addendum

The Secret of Building 42
wp:Joseph Desch

ted talks federal prosecutor from san francisco bitcoin sleuthing

wp:The Door into Summer
wp:Robert A. Heinlein
buy versus build

wp:Office Space
shaving pennies


More information about the CentralOH mailing list