[CentralOH] 2014-01-17_道場_Scribbles_ 落書/惡文? Much more

Erik Welch erik.n.welch at gmail.com
Tue Jan 21 20:11:41 CET 2014


On Tue, Jan 21, 2014 at 1:22 PM, Chris Folsom <jcfolsom at pureperfect.com>
wrote:

>
> If you have more than one line of code, you're probably doing it wrong.


Many people have learned to be more modest by avoiding statements like
this.  You never know for certain when you may be wrong.  In this case, you
misunderstood the problem being considered and what the final outcome is
supposed to be.  It is not simply counting the number of words, the goal is
to count the number of occurrences of each word.  There are more nuances to
this than you might initially expect.  There are also many different
approaches, and each has its own merits.  In general, there often isn't a
single optimal solution, because "optimal" can mean many different things,
and optimality can also be different in different contexts.

Anyway, thanks for sharing the regex example!

p.s. I tried to run your code in Python to count the total number of words
in a string.  It gave me an incorrect answer.


On Tue, Jan 21, 2014 at 1:22 PM, Chris Folsom <jcfolsom at pureperfect.com>wrote:

>
> If it's for word counting. You're over complicating it. In language
> neutral pseudo-code:
>
> var myString = "blah.blah?blah! blah blah"
>
> var regExpForWhiteSpaceAndPunctuation = "(\.|\;|\:|\!|\?|\s)*"
>
> var[] wordsInMyString = myString.split(regExpForWhiteSpaceAndPunctuation)
>
> println "Word count: " + wordsInMyString.length
>
> or alternatively:
>
> "some random string".split("(\.|\;|\:|\!|\?|\s)*").length
>
> That ought to be the optimal solution in any language. No substitution or
> additional string creation is necessary. The python equivalent is
> re.split(*pattern*, *string*, *maxsplit=0*, *flags=0*)
> So
>
> len(re.split(regExpForWhitespaceAndPunctuation, myString))
>
>
> If you have more than one line of code, you're probably doing it wrong.
>
>  -------- Original Message --------
> Subject: Re: [CentralOH] 2014-01-17_道場_Scribbles_ 落書/惡文?
> Much more
> From: Erik Welch <erik.n.welch at gmail.com>
> Date: Tue, January 21, 2014 12:38 pm
> To: "Mailing list for Central Ohio Python User Group (COhPy)"
> <centraloh at python.org>
>
> I have a few comments on the word counting example.  First, it is possible
> to view the referenced iPython notebooks online from
> http://nbviewer.ipython.org/ .  See the notebook here:
>
>
> http://nbviewer.ipython.org/url/colug.net/python/dojo/20140117/word-count-example-rev2.ipynb?create=1
>
> Second, there have been somewhat recent blog posts using word counting as
> an example.  The first compares a verbose solution with simple terms to
> concise solutions with complex terms:
>
> http://matthewrocklin.com/blog/work/2013/11/15/Functional-Wordcount/
>
> A newer blog post looks at the performance of Python and other languages
> for a specific text processing task:
>
> http://matthewrocklin.com/blog/work/2014/01/13/Text-Benchmarks/
>
> Finally, to learn more about functional data analysis in Python, here is a
> video tutorial from the same author as the above posts (and it uses the
> `toolz` library, although the talk isn't strictly about `toolz`):
>
> http://vimeo.com/80096814
>
> Cheers,
> Erik
>
>
> On Mon, Jan 20, 2014 at 6:59 PM, iynaix <iynaix at gmail.com> wrote:
>
>> Quick aside:
>>
>> If you're on Python 2.7 and above, you can do the following:
>>
>>     from collections import Counter
>>     counts = Counter(words_list)
>>
>> Counter is a dictionary-like object that has nice utilities such as being
>> able to add or subtract counters from one another, and most_common(), which
>> is very useful. See the link below for the official docs:
>>
>> http://docs.python.org/2/library/collections.html#collections.Counter
>>
>> Cheers,
>> XY
>>
>>
>> On Tue, Jan 21, 2014 at 5:09 AM, <jep200404 at columbus.rr.com> wrote:
>>
>>> The most interesting thing[1] was interesting because I should have
>>> known it.
>>> It used the dictionary get method to count words.
>>>
>>>     counts = {}
>>>     for word in words_list:
>>>         counts[word] = counts.get(word, 0) + 1
>>>
>>> We had to look it up in Learning Python by Mark Lutz.
>>> (p 210 in a printing of 4th edition.)
>>>
>>> It reminded me of the introductory examples on starting on page 19 of
>>> pfda[2].
>>>
>>> Someone just passed the CISSP exam, so now can play with Python.
>>> If he passed, his employer would pay for it (about $600).
>>> If he failed, his employer would not pay for it.
>>> It was a four hour multiple choice exam.
>>> One gets a pass/fail grade. 70% is passing.
>>> One is not told how well one did.
>>>
>>> http://scott.a16z.com/2014/01/17/success-at-work-failure-at-home/
>>> http://bhorowitz.com/2014/01/02/can-do-vs-cant-do-cultures/
>>>
>>> wp:Newton's method
>>> need to by my post my euler to github Euler #80 11.7 ms
>>> https://github.com/fandi-peng/Project_Euler/raw/master/code/euler80.py
>>>
>>> someone was messing with vincent 2013-05-09
>>> https://mail.python.org/pipermail/centraloh/2013-May/001670.html
>>>
>>> wp:Linear algebra
>>> wp:Linear programming
>>>
>>> wp:Weibull distribution
>>>
>>> Bunnie doing open source hardware
>>> wp:Andrew Huang
>>> http://www.eetimes.com/author.asp?section_id=69&doc_id=1320638
>>>
>>> http://dangerousprototypes.com/2012/08/23/workshop-video-36-beers-in-bunnies-workshop/
>>>
>>> http://dangerousprototypes.com/2012/04/19/video-hua-qiang-bei-market-in-shenzhen-china/
>>>
>>>
>>> http://www.zdnet.com/uks-security-branch-says-ubuntu-most-secure-end-user-os-7000025312/
>>>
>>> nbconvert crashes converting .ipynb to html or pdf
>>>     simple ones work
>>>     complex ones, such as with Latex, crash
>>>
>>> ghosts can not be shone in China
>>>     shall not promote superstitious stuff in China
>>>
>>> http://172.17.153.149:8000/Word_Count_Example.ipynb
>>> counts = dict()
>>> for word in words_list:
>>>     counts[word] = counts.get(word, 0) + 1
>>>
>>> wp:Chromium (web browser)
>>> wp:SRWare Iron
>>>
>>> There are towns in China that are built as movie sets to promote the
>>> making of
>>> movies there. Do the residents become extras?
>>> wp:Extra (acting)
>>>
>>>
>>> https://github.com/ipython/ipython/tree/master/examples/notebooks#a-collection-of-notebooks-for-using-ipython-effectively
>>>
>>> http://nbviewer.ipython.org/github/ipython/ipython/blob/master/examples/notebooks/Part%204%20-%20Markdown%20Cells.ipynb
>>>
>>> wp:CISSP
>>>
>>> CYNIC, n. A blackguard whose faulty vision sees things as they are, not
>>> as they
>>> ought to be. Hence the custom among the Scythians of plucking out a
>>> cynic's
>>> eyes to improve his vision.
>>>
>>> PITH, n. See Dorothy Parker
>>> Tallulah Bankhead
>>> What do Tata and Ford have in common?
>>>
>>> The Devil's Dictionary
>>>
>>> pypy is faster than cpython
>>> speed.pypy.org
>>>
>>> open-source food?
>>> http://www.wildfermentation.com/
>>>
>>> http://www.npr.org/2012/06/13/154914381/fermentation-when-food-goes-bad-but-stays-good
>>>
>>> wp:Charles Csuri
>>> http://oncampus.osu.edu/v29n18/thisissue_6.html
>>>
>>> https://duckduckgo.com/html/?q=charles%20csuri%20wosu%20beyond%20boundaries
>>> wp:Lava lamp
>>>
>>> Polymorphism (computer science), the ability in computer programming to
>>> present
>>> the same interface for differing underlying forms (data types).
>>> Operator overloading can be an example of polymorphism.
>>> wp:Polymorphism (computer science)
>>> wp:Operator overloading
>>> Python supports polymorphism
>>>
>>> On Fri, 17 Jan 2014 19:48:27 -0500, Fandi Peng <fandi.814 at gmail.com>
>>> wrote:
>>>
>>> >     https://github.com/brandon-rhodes/astronomy-notebooks
>>>
>>> That's a great introduction to (i)python and is getting some maintainance
>>> attention, including the addition of a pandas notebook.
>>>
>>> There are probably more notes and ipython notebooks coming for
>>> this dojo.
>>>
>>> [1] See
>>> http://colug.net/python/dojo/20140117/word-count-example-rev2.ipynb
>>>
>>> [2] Python for Data Analysis by Wes McKinney
>>>     this book just keeps coming back up
>>>     http://shop.oreilly.com/product/0636920023784.do
>>>     http://blog.wesmckinney.com/
>>> _______________________________________________
>>> CentralOH mailing list
>>> CentralOH at python.org
>>> https://mail.python.org/mailman/listinfo/centraloh
>>>
>>
>>
>> _______________________________________________
>> CentralOH mailing list
>> CentralOH at python.org
>> https://mail.python.org/mailman/listinfo/centraloh
>>
>>
> ------------------------------
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centralo
>
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20140121/c32be206/attachment-0001.html>


More information about the CentralOH mailing list