[CentralOH] 2016-05-23 會議 Scribbles 落書/惡文?: counting words, foo in set, foo in list, generators galore, re.split(), re.findall(), aws lambda, requests, islice, salatin, asyncio uv
jep200404 at columbus.rr.com
jep200404 at columbus.rr.com
Thu Jun 2 20:01:37 EDT 2016
Thanks again to Pillar and Chris Baker for hosting us at Pillar's Forge.
There's a heck of a lot more Python code in this month's scribbles than usual.
There is much to learn from.
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
PyOhio 2016
T-Shirts are cool. Buy one.
Eric and Jan modeled for pictures.
Please submit talk proposals for PyOhio
beginning speakers are very welcome and encouraged to submit a talk
1/4 to 1/3 of talks are reserved for people
who have never talked at a conference before
Deadline has been extended to June 3
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
March Temperature Prediction Challenge
https://github.com/cohpy/challenge-201603-temps/blob/master/predictions.txt
Andrew Kubera won temperature guessing challenge.
Won hat.
What happened to the other prizes, such as the HP cooler and HP hoodie?
The Daily Growler https://www.thedailygrowler.com/
challenge-201603-temps/answer5/cohpy_weather_sommerville.py
Python3 versus Python2
Has code to complain and quit if it is not run by python2,
but that code has syntax error when run by python3.
Eric Floehr showed how to easily convert python2 code to python3 code
with 2to3 program.
2to3 is part of python3
converts most of python2 code to python3 code
individual files, or whole directories
2to3 -w -n ;# convert without backups: this is dangerous.
Cute name: 'Nulluary'
How about using None instead?
indents by two
all challenge submitters (not just this challenge) got python stickers
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
found a 12' HDMI cable this time. (about half as long as last month)
First time to cohpy monthly meeting:
Scott
John
Compare eoo.py, foo.py, and goo.py below.
What do you like? Why?
cohpy at forge:~/20160523$ cat eoo.py
#!/usr/bin/env python2
import sys
if sys.version_info.major == 2:
pass
else:
sys.stderr.write(
'You are using Python version %s. This script only works with 2\n' %
sys.version_info.major)
sys.exit(1)
cohpy at forge:~/20160523$ python2 eoo.py
cohpy at forge:~/20160523$ python3 eoo.py
You are using Python version 3. This script only works with 2
cohpy at forge:~/20160523$
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
cohpy at forge:~/20160523$ cat foo.py
#!/usr/bin/env python2
import sys
if sys.version_info.major != 2:
sys.stdout = sys.stderr
print (
'You are using Python version %s. This script only works with 2' %
sys.version_info.major)
sys.exit(1)
cohpy at forge:~/20160523$ python2 foo.py
cohpy at forge:~/20160523$ python3 foo.py
You are using Python version 3. This script only works with 2
cohpy at forge:~/20160523$
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
cohpy at forge:~/20160523$ cat goo.py
#!/usr/bin/env python2
import sys
assert sys.version_info.major == 2, (
'You are using Python version %s. This script only works with 2.' %
sys.version_info.major)
cohpy at forge:~/20160523$ python2 goo.py
cohpy at forge:~/20160523$ python3 goo.py
Traceback (most recent call last):
File "goo.py", line 7, in <module>
sys.version_info.major)
AssertionError: You are using Python version 3. This script only works with 2.
cohpy at forge:~/20160523$
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Aschinger Blvd is near Brazenhead
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
word counting challenge
seven entries
code is available at https://github.com/cohpy/challenge-201604-words
http://www.gutenberg.org/cache/epub/84/pg84.txt
Will lock you out after only a few requests.
eric's
uses virtualenvwrapper
mkvirtualenv challenge-201604 -p /usr/bin/python3
import textwrap
os.get_terminal_size()
more portable than os.environ['COLUMNS']
stty
did not define word
pytest
@pytest.mark.parametrize is fantastic
75071 words
the 4195
and 2977
i 2850
of 2641
to 2094
my 1776
a 1391
in 1128
was 1021
that 1017
joe friedrich's
range(0, n) -> range(n)
Is word a valid one-letter word? Compare the following:
word == 'A' or word == 'a' or word == 'I'
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# tuples (and lists) are OK
VALID_ONE_LETTER_WORDS = ('A', 'a', 'I')
word in VALID_ONE_LETTER_WORDS
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Sets are fast, even when the set is large.
VALID_ONE_LETTER_WORDS = set(('A', 'a', 'I'))
word in VALID_ONE_LETTER_WORDS
counting: compare the following:
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
new_dictionary = {}
for word in ...:
if word in new_dictionary:
new_dictionary[word] += 1
else:
new_dictionary[word] = 1
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
new_dictionary = {}
for word in ...:
if word not in new_dictionary:
new_dictionary[word] = 0
new_dictionary[word] += 1
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
new_dictionary = {}
for word in ...:
new_dictionary[word] = new_dictionary.get(word, 0) + 1
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
from collections import defaultdict
new_dictionary = defaultdict(int)
for word in ...:
new_dictionary[word] += 1
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
from collections import Counter
word_counts = Counter(
word for word in from_words_list
if len(word) > 1 or word in VALID_ONE_LETTER_WORDS)
word_counts.most_common(n)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
refactor the above to:
from collections import Counter
def is_valid_word(word):
return len(word) > 1 or word in VALID_ONE_LETTER_WORDS
word_counts = Counter(
word for word in from_words_list
if is_valid_word(word))
word_counts.most_common(n)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
How would you use pandas?
Dante's Inferno had different headers.
Python's libraries are fabulous.
You'll get a feel for:
You know, I'll bet somebody else has already done this
and has some slick library that does what I want.
It is amazing what you will find on PyPI.
Search PyPI at https://pypi.python.org/pypi.
cw andrews'
class oriented programming (not object oriented programming)
What's the benefit of class oriented programming
over object oriented programming?
pretty plots!
loads unix dictionary :-)!!! into a list :-(!!! (not a set)
checking to see if each word was in the unix dictionary was slow
(90 seconds)
Is that because the unix dictionary was stored in a list
instead of a set?
Study:
https://nbviewer.jupyter.org/github/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb
https://github.com/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb
not_str_gens???
re.split()
re.findall()
Study Brian's nested generators:
https://nbviewer.jupyter.org/github/cohpy/challenge-201604-words/blob/master/bcostlow/Challenge.ipynb
https://github.com/cohpy/challenge-201604-words/blob/master/bcostlow/Challenge.ipynb
eric miller (ezeeetm?)
used aws lambda
java
javascript
python (2.7 only)
virtualenv with almost everything installed
boto (critter in the amazon river)
much better support for 2.7 than 3
can trick aws lambda into running R by wrapping it from python.
no __name__ == '__main__'
use function instead
no command line arguments
pass json blob
stdout goes to cloud watch logs
they tricked lambda into running R (wrapped in python)
URLs for a mirror that doesn't balk when asked too many times is mighty
handy. Thanks Eric!
http://www.gutenberg.lib.md.us/8/84/84.zip
http://www.gutenberg.lib.md.us/8/84/84.txt
Does UrlFactory work for ids in the teens?
http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages
http://www.gutenberg.org/robot/harvest?filetypes%5B%5D=txt
requests library is awesome
the guy who created it is awesome, has created other libraries
Kenneth Reitz
s3 bucket
mongodb
16 hours to crunch all Gutenberg texts
100 lanes wide
brian costlow
used generator to skip over gutenberg header
Combined simple nested generators with Counter from collections.
prettiest code that I saw
(there may be prettier code that I did not look at)
need to study push_counter()
jan milosh
Wow!: Combines words that were split across lines.
Should be easy to convert to use generators instead of lists.
Pythonic tests!
may challenge is about generators
https://github.com/cohpy/challenge-201605-generators/blob/master/README.md
two dimensional walk
nested generators
sound generator
pull request
directory with github user name.
Study:
Nested generators
https://mail.python.org/pipermail/centraloh/2013-June/001718.html
very simple examples of "yield from"
https://mail.python.org/pipermail/centraloh/2016-May/002816.html
replace
list(x[0] for x in zip(foo(), range(10)))
with
from itertools import islice
list(islice(foo(), 10))
brian costlow
generators
http://nbviewer.jupyter.org/github/cohpy/challenge-201604-words/blob/master/bcostlow/COhPy Generator Talk.ipynb
https://github.com/cohpy/challenge-201604-words/blob/master/bcostlow/COhPy Generator Talk.ipynb
return value is StopIteration error value.
asyncio
http://dabeaz.com/generators/
Highly recommended presentations about generators:
Generator Tricks for Systems Programmers
http://www.dabeaz.com/generators-uk/
A Curious Course on Coroutines and Concurrency
http://www.dabeaz.com/coroutines/
Generators: The Final Frontier
http://www.dabeaz.com/finalgenerator/
check out https://docs.python.org/2/library/collections.html#collections-abstract-base-classes
six
https://pypi.python.org/pypi/six
light bulbs should be going off for asyncio
Avoid reinventing the wheel.
Check out itertools and collections.
Although you should be duck typing as much as possible in Python,
build from abstract base classes from collections.
docs.python.org/2/library/collections.html#collections-abstract-base-classes
https://docs.python.org/3/library/collections.html
github.com/bcostlow
new challenge, submit:
polish stuff I've already done
break out of nested loops by using a generator (Raymond Hettinger)
http://nbviewer.jupyter.org/url/colug.net/python/dojo/20160429/dojo-20160429-2016-Mar-COhPy_Challenge_Rough-20160513-1144.ipynb
pretty: cell #52
nasty: last code cell
clean up old nested even fib from 2013 for python3,
(find project euler notebooks)
find the euler where I tweaked for speed
s/gen/iterable/
https://mail.python.org/pipermail/centraloh/2013-June/001718.html
yield from https://mail.python.org/pipermail/centraloh/2016-May/002816.html
pass pylint
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
What is a cleaner way of limiting an iterable to the first n items
than list(x[0] for x in zip(foo(), range(10)))?
https://mail.python.org/pipermail/centraloh/2016-May/002816.html
islice from itertools!
islice(foo(), n)
foo in sets versus lists
Searching for stuff in sets is smoking fast compared to searching lists.
http://nbviewer.jupyter.org/github/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb
https://github.com/james-prior/cohpy/blob/master/20160523/cohpy-20160523-speed-of-searching-sets-and-lists.ipynb
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Adjourned to Brazenhead on Fifth Ave
http://www.hdrestaurants.com/brazenhead/5thavenue/
https://en.wikipedia.org/wiki/Memorial_Day_(2012_film)
wp:Polymino
wp:Polyface Farm
wp:Joel Salatin
wp: prefix means Wikipedia
To get good answers, consider following the advice in the links below.
http://catb.org/~esr/faqs/smart-questions.html
http://web.archive.org/web/20090627155454/www.greenend.org.uk/rjk/2000/06/14/quoting.html
asyncio uv
https://news.ycombinator.com/threads?id=akubera
More information about the CentralOH
mailing list