[CentralOH] 2016-05-23 會議 Scribbles 落書/惡文?: counting words, foo in set, foo in list, generators galore, re.split(), re.findall(), aws lambda, requests, islice, salatin, asyncio uv

jep200404 at columbus.rr.com jep200404 at columbus.rr.com
Thu Jun 2 20:01:37 EDT 2016


Thanks again to Pillar and Chris Baker for hosting us at Pillar's Forge.

There's a heck of a lot more Python code in this month's scribbles than usual.
There is much to learn from.

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

PyOhio 2016

    T-Shirts are cool. Buy one.
    Eric and Jan modeled for pictures.

    Please submit talk proposals for PyOhio
    beginning speakers are very welcome and encouraged to submit a talk
        1/4 to 1/3 of talks are reserved for people
        who have never talked at a conference before
    Deadline has been extended to June 3

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

March Temperature Prediction Challenge

https://github.com/cohpy/challenge-201603-temps/blob/master/predictions.txt

Andrew Kubera won temperature guessing challenge.
Won hat.
What happened to the other prizes, such as the HP cooler and HP hoodie?

The Daily Growler https://www.thedailygrowler.com/

challenge-201603-temps/answer5/cohpy_weather_sommerville.py

    Python3 versus Python2

        Has code to complain and quit if it is not run by python2,
        but that code has syntax error when run by python3.

        Eric Floehr showed how to easily convert python2 code to python3 code
        with 2to3 program.

        2to3 is part of python3

            converts most of python2 code to python3 code

                individual files, or whole directories

            2to3 -w -n ;# convert without backups: this is dangerous.

    Cute name: 'Nulluary'
        How about using None instead?

    indents by two

all challenge submitters (not just this challenge) got python stickers

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

found a 12' HDMI cable this time. (about half as long as last month)

First time to cohpy monthly meeting:
    Scott
    John

Compare eoo.py, foo.py, and goo.py below.
What do you like? Why?

    cohpy at forge:~/20160523$ cat eoo.py
    #!/usr/bin/env python2

    import sys

    if sys.version_info.major == 2:
        pass
    else:
        sys.stderr.write(
            'You are using Python version %s. This script only works with 2\n' %
            sys.version_info.major)
        sys.exit(1)
    cohpy at forge:~/20160523$ python2 eoo.py
    cohpy at forge:~/20160523$ python3 eoo.py
    You are using Python version 3. This script only works with 2
    cohpy at forge:~/20160523$

    # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

    cohpy at forge:~/20160523$ cat foo.py
    #!/usr/bin/env python2

    import sys

    if sys.version_info.major != 2:
        sys.stdout = sys.stderr
        print (
            'You are using Python version %s. This script only works with 2' %
            sys.version_info.major)
        sys.exit(1)
    cohpy at forge:~/20160523$ python2 foo.py
    cohpy at forge:~/20160523$ python3 foo.py
    You are using Python version 3. This script only works with 2
    cohpy at forge:~/20160523$

    # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

    cohpy at forge:~/20160523$ cat goo.py
    #!/usr/bin/env python2

    import sys

    assert sys.version_info.major == 2, (
        'You are using Python version %s. This script only works with 2.' %
        sys.version_info.major)
    cohpy at forge:~/20160523$ python2 goo.py
    cohpy at forge:~/20160523$ python3 goo.py
    Traceback (most recent call last):
      File "goo.py", line 7, in <module>
        sys.version_info.major)
    AssertionError: You are using Python version 3. This script only works with 2.
    cohpy at forge:~/20160523$

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

Aschinger Blvd is near Brazenhead

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

word counting challenge

seven entries
code is available at https://github.com/cohpy/challenge-201604-words

http://www.gutenberg.org/cache/epub/84/pg84.txt
    Will lock you out after only a few requests.

eric's

    uses virtualenvwrapper
    mkvirtualenv challenge-201604 -p /usr/bin/python3

    import textwrap
    os.get_terminal_size()
        more portable than os.environ['COLUMNS']

        stty

    did not define word

    pytest
    @pytest.mark.parametrize is fantastic

    75071 words

    the 4195
    and 2977
    i 2850
    of 2641
    to 2094
    my 1776
    a 1391
    in 1128
    was 1021
    that 1017

joe friedrich's

    range(0, n) -> range(n)

    Is word a valid one-letter word? Compare the following:

        word == 'A' or word == 'a' or word == 'I'

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        # tuples (and lists) are OK
        VALID_ONE_LETTER_WORDS = ('A', 'a', 'I')
        word in VALID_ONE_LETTER_WORDS

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        # Sets are fast, even when the set is large.
        VALID_ONE_LETTER_WORDS = set(('A', 'a', 'I'))
        word in VALID_ONE_LETTER_WORDS

    counting: compare the following:

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        new_dictionary = {}
        for word in ...:
            if word in new_dictionary:
                new_dictionary[word] += 1
            else:
                new_dictionary[word] = 1

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        new_dictionary = {}
        for word in ...:
            if word not in new_dictionary:
                new_dictionary[word] = 0
            new_dictionary[word] += 1

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        new_dictionary = {}
        for word in ...:
            new_dictionary[word] = new_dictionary.get(word, 0) + 1

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        from collections import defaultdict

        new_dictionary = defaultdict(int)
        for word in ...:
            new_dictionary[word] += 1

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        from collections import Counter

        word_counts = Counter(
            word for word in from_words_list
            if len(word) > 1 or word in VALID_ONE_LETTER_WORDS)

        word_counts.most_common(n)

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        refactor the above to:

        from collections import Counter

        def is_valid_word(word):
            return len(word) > 1 or word in VALID_ONE_LETTER_WORDS

        word_counts = Counter(
            word for word in from_words_list
            if is_valid_word(word))

        word_counts.most_common(n)

        # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

        How would you use pandas?

    Dante's Inferno had different headers.

    Python's libraries are fabulous.
    You'll get a feel for:

        You know, I'll bet somebody else has already done this
        and has some slick library that does what I want.

    It is amazing what you will find on PyPI.
    Search PyPI at https://pypi.python.org/pypi.

cw andrews'

    class oriented programming (not object oriented programming)
    What's the benefit of class oriented programming
    over object oriented programming?

    pretty plots!

    loads unix dictionary :-)!!! into a list :-(!!! (not a set)

    checking to see if each word was in the unix dictionary was slow
        (90 seconds)
        Is that because the unix dictionary was stored in a list
        instead of a set?

        Study:

            https://nbviewer.jupyter.org/github/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb
            https://github.com/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb

    not_str_gens???

    re.split()
    re.findall()

    Study Brian's nested generators:

        https://nbviewer.jupyter.org/github/cohpy/challenge-201604-words/blob/master/bcostlow/Challenge.ipynb
        https://github.com/cohpy/challenge-201604-words/blob/master/bcostlow/Challenge.ipynb

eric miller (ezeeetm?)

    used aws lambda

        java
        javascript
        python (2.7 only)
            virtualenv with almost everything installed
            boto (critter in the amazon river)
                much better support for 2.7 than 3

        can trick aws lambda into running R by wrapping it from python.

        no __name__ == '__main__'
            use function instead
        no command line arguments
            pass json blob

        stdout goes to cloud watch logs

        they tricked lambda into running R (wrapped in python)

        URLs for a mirror that doesn't balk when asked too many times is mighty
        handy. Thanks Eric!

        http://www.gutenberg.lib.md.us/8/84/84.zip
        http://www.gutenberg.lib.md.us/8/84/84.txt
        Does UrlFactory work for ids in the teens?

        http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages
        http://www.gutenberg.org/robot/harvest?filetypes%5B%5D=txt

    requests library is awesome
    the guy who created it is awesome, has created other libraries
    Kenneth Reitz

    s3 bucket
    mongodb

    16 hours to crunch all Gutenberg texts
    100 lanes wide

brian costlow

    used generator to skip over gutenberg header
    Combined simple nested generators with Counter from collections.
        prettiest code that I saw
            (there may be prettier code that I did not look at)

    need to study push_counter()

jan milosh

    Wow!: Combines words that were split across lines.

    Should be easy to convert to use generators instead of lists.

    Pythonic tests!

may challenge is about generators

    https://github.com/cohpy/challenge-201605-generators/blob/master/README.md

    two dimensional walk
    nested generators
    sound generator

    pull request
        directory with github user name.

    Study:

        Nested generators
            https://mail.python.org/pipermail/centraloh/2013-June/001718.html

        very simple examples of "yield from"
            https://mail.python.org/pipermail/centraloh/2016-May/002816.html

                replace
                    list(x[0] for x in zip(foo(), range(10)))
                with
                    from itertools import islice
                    list(islice(foo(), 10))

brian costlow

    generators

         http://nbviewer.jupyter.org/github/cohpy/challenge-201604-words/blob/master/bcostlow/COhPy Generator Talk.ipynb
         https://github.com/cohpy/challenge-201604-words/blob/master/bcostlow/COhPy Generator Talk.ipynb

    return value is StopIteration error value.

    asyncio

    http://dabeaz.com/generators/

    Highly recommended presentations about generators:

        Generator Tricks for Systems Programmers
        http://www.dabeaz.com/generators-uk/

        A Curious Course on Coroutines and Concurrency
        http://www.dabeaz.com/coroutines/

        Generators: The Final Frontier
        http://www.dabeaz.com/finalgenerator/

    check out https://docs.python.org/2/library/collections.html#collections-abstract-base-classes

    six
    https://pypi.python.org/pypi/six

    light bulbs should be going off for asyncio

    Avoid reinventing the wheel.
    Check out itertools and collections.

    Although you should be duck typing as much as possible in Python,
    build from abstract base classes from collections.

        docs.python.org/2/library/collections.html#collections-abstract-base-classes
        https://docs.python.org/3/library/collections.html

    github.com/bcostlow

new challenge, submit:

    polish stuff I've already done

        break out of nested loops by using a generator (Raymond Hettinger)

        http://nbviewer.jupyter.org/url/colug.net/python/dojo/20160429/dojo-20160429-2016-Mar-COhPy_Challenge_Rough-20160513-1144.ipynb

            pretty: cell #52
            nasty: last code cell

        clean up old nested even fib from 2013 for python3,
            (find project euler notebooks)
                find the euler where I tweaked for speed
            s/gen/iterable/
            https://mail.python.org/pipermail/centraloh/2013-June/001718.html

        yield from https://mail.python.org/pipermail/centraloh/2016-May/002816.html

    pass pylint

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

What is a cleaner way of limiting an iterable to the first n items
than  list(x[0] for x in zip(foo(), range(10)))?
https://mail.python.org/pipermail/centraloh/2016-May/002816.html

    islice from itertools!

    islice(foo(), n)

foo in sets versus lists

    Searching for stuff in sets is smoking fast compared to searching lists.

        http://nbviewer.jupyter.org/github/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb
        https://github.com/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

Adjourned to Brazenhead on Fifth Ave

http://www.hdrestaurants.com/brazenhead/5thavenue/

https://en.wikipedia.org/wiki/Memorial_Day_(2012_film)

wp:Polymino
wp:Polyface Farm
wp:Joel Salatin

wp: prefix means Wikipedia
To get good answers, consider following the advice in the links below.
http://catb.org/~esr/faqs/smart-questions.html
http://web.archive.org/web/20090627155454/www.greenend.org.uk/rjk/2000/06/14/quoting.html

asyncio uv

https://news.ycombinator.com/threads?id=akubera


More information about the CentralOH mailing list