From JoyceUlysses.txt -- words occurring exactly once
dn
PythonList at DancesWithMice.info
Wed Jun 5 00:33:15 EDT 2024
On 31/05/24 14:26, HenHanna via Python-list wrote:
> On 5/30/2024 2:18 PM, dn wrote:
>> On 31/05/24 08:03, HenHanna via Python-list wrote:
>>>
>>> Given a text file of a novel (JoyceUlysses.txt) ...
>>>
>>> could someone give me a pretty fast (and simple) Python program
>>> that'd give me a list of all words occurring exactly once?
>>>
>>> -- Also, a list of words occurring once, twice or 3 times
>>>
>>>
>>>
>>> re: hyphenated words (you can treat it anyway you like)
>>>
>>> but ideally, i'd treat [editor-in-chief]
>>> [go-ahead] [pen-knife]
>>> [know-how] [far-fetched] ...
>>> as one unit.
>
>
>>
>> Split into words - defined as you will.
>> Use Counter.
>>
>> Show some (of your) code and we'll be happy to critique...
>
>
> hard to decide what to do with hyphens
> and apostrophes
> (I'd, he's, can't, haven't, A's and B's)
>
>
> 2-step-Process
>
> 1. make a file listing all words (one word per line)
>
> 2. then, doing the counting. using
> from collections import Counter
Apologies for lateness - only just able to come back to this.
This issue is not Python, and is not solved by code!
If you/your teacher can't define a "word", the code, any code, will
almost-certainly be wrong!
One of the interesting aspects of our work is that we can write all
manner of tests to try to ensure that the code is correct: unit tests,
integration tests, system tests, acceptance tests, eye-tests, ...
However, there is no such thing as a test (or proof) that statements of
requirements are complete or correct!
(nor for any other previous stages of the full project life-cycle)
As coders we need to learn to require clear specifications and not
attempt to read-between-the-lines, use our initiative, or otherwise 'not
bother the ...'. When there is ambiguity, we should go back to the
user/client/boss and seek clarification. They are the
domain/subject-matter experts...
I'm reminded of a cartoon, possibly from some IBM source, first seen in
black-and-white but here in living-color:
https://www.monolithic.org/blogs/presidents-sphere/what-the-customer-really-wants
That has been the sad history of programming and dev.projects - wherein
we are blamed for every short-coming, because no-one else understands
the nuances of development projects.
If we don't insist on clarity, are we our own worst enemy?
--
Regards,
=dn
More information about the Python-list
mailing list