From JoyceUlysses.txt -- words occurring exactly once

dn PythonList at DancesWithMice.info
Wed Jun 5 00:33:15 EDT 2024


On 31/05/24 14:26, HenHanna via Python-list wrote:
> On 5/30/2024 2:18 PM, dn wrote:
>> On 31/05/24 08:03, HenHanna via Python-list wrote:
>>>
>>> Given a text file of a novel (JoyceUlysses.txt) ...
>>>
>>> could someone give me a pretty fast (and simple) Python program 
>>> that'd give me a list of all words occurring exactly once?
>>>
>>>                -- Also, a list of words occurring once, twice or 3 times
>>>
>>>
>>>
>>> re: hyphenated words        (you can treat it anyway you like)
>>>
>>>         but ideally, i'd treat  [editor-in-chief]
>>>                                 [go-ahead]  [pen-knife]
>>>                                 [know-how]  [far-fetched] ...
>>>         as one unit.
> 
> 
>>
>> Split into words - defined as you will.
>> Use Counter.
>>
>> Show some (of your) code and we'll be happy to critique...
> 
> 
> hard to decide what to do with hyphens
>                 and apostrophes
>               (I'd,  he's,  can't, haven't,  A's  and  B's)
> 
> 
> 2-step-Process
> 
>            1. make a file listing all words (one word per line)
> 
>            2.  then, doing the counting.  using
>                                from collections import Counter


Apologies for lateness - only just able to come back to this.

This issue is not Python, and is not solved by code!

If you/your teacher can't define a "word", the code, any code, will 
almost-certainly be wrong!


One of the interesting aspects of our work is that we can write all 
manner of tests to try to ensure that the code is correct: unit tests, 
integration tests, system tests, acceptance tests, eye-tests, ...

However, there is no such thing as a test (or proof) that statements of 
requirements are complete or correct!
(nor for any other previous stages of the full project life-cycle)

As coders we need to learn to require clear specifications and not 
attempt to read-between-the-lines, use our initiative, or otherwise 'not 
bother the ...'. When there is ambiguity, we should go back to the 
user/client/boss and seek clarification. They are the 
domain/subject-matter experts...

I'm reminded of a cartoon, possibly from some IBM source, first seen in 
black-and-white but here in living-color: 
https://www.monolithic.org/blogs/presidents-sphere/what-the-customer-really-wants

That has been the sad history of programming and dev.projects - wherein 
we are blamed for every short-coming, because no-one else understands 
the nuances of development projects.

If we don't insist on clarity, are we our own worst enemy?


-- 
Regards,
=dn


More information about the Python-list mailing list