From JoyceUlysses.txt -- words occurring exactly once

Thomas Passin list1 at tompassin.net
Wed Jun 5 07:10:19 EDT 2024


On 6/5/2024 12:33 AM, dn via Python-list wrote:
> On 31/05/24 14:26, HenHanna via Python-list wrote:
>> On 5/30/2024 2:18 PM, dn wrote:
>>> On 31/05/24 08:03, HenHanna via Python-list wrote:
>>>>
>>>> Given a text file of a novel (JoyceUlysses.txt) ...
>>>>
>>>> could someone give me a pretty fast (and simple) Python program 
>>>> that'd give me a list of all words occurring exactly once?
>>>>
>>>>                -- Also, a list of words occurring once, twice or 3 
>>>> times
>>>>
>>>>
>>>>
>>>> re: hyphenated words        (you can treat it anyway you like)
>>>>
>>>>         but ideally, i'd treat  [editor-in-chief]
>>>>                                 [go-ahead]  [pen-knife]
>>>>                                 [know-how]  [far-fetched] ...
>>>>         as one unit.
>>
>>
>>>
>>> Split into words - defined as you will.
>>> Use Counter.
>>>
>>> Show some (of your) code and we'll be happy to critique...
>>
>>
>> hard to decide what to do with hyphens
>>                 and apostrophes
>>               (I'd,  he's,  can't, haven't,  A's  and  B's)
>>
>>
>> 2-step-Process
>>
>>            1. make a file listing all words (one word per line)
>>
>>            2.  then, doing the counting.  using
>>                                from collections import Counter
> 
> 
> Apologies for lateness - only just able to come back to this.
> 
> This issue is not Python, and is not solved by code!
> 
> If you/your teacher can't define a "word", the code, any code, will 
> almost-certainly be wrong!
> 
> 
> One of the interesting aspects of our work is that we can write all 
> manner of tests to try to ensure that the code is correct: unit tests, 
> integration tests, system tests, acceptance tests, eye-tests, ...
> 
> However, there is no such thing as a test (or proof) that statements of 
> requirements are complete or correct!
> (nor for any other previous stages of the full project life-cycle)
> 
> As coders we need to learn to require clear specifications and not 
> attempt to read-between-the-lines, use our initiative, or otherwise 'not 
> bother the ...'. When there is ambiguity, we should go back to the 
> user/client/boss and seek clarification. They are the 
> domain/subject-matter experts...
> 
> I'm reminded of a cartoon, possibly from some IBM source, first seen in 
> black-and-white but here in living-color: 
> https://www.monolithic.org/blogs/presidents-sphere/what-the-customer-really-wants

That one's been kicking around for years ... good job in finding a link 
for it!

> That has been the sad history of programming and dev.projects - wherein 
> we are blamed for every short-coming, because no-one else understands 
> the nuances of development projects.

Of course, we see this lack of clarity all the time in questions to the 
list.  I often wonder how these askers can possibly come up with 
acceptable code if they don't realize they don't truly know what it's 
supposed to do.

> If we don't insist on clarity, are we our own worst enemy?
> 
> 



More information about the Python-list mailing list