From JoyceUlysses.txt -- words occurring exactly once
Thomas Passin
list1 at tompassin.net
Wed Jun 5 07:10:19 EDT 2024
On 6/5/2024 12:33 AM, dn via Python-list wrote:
> On 31/05/24 14:26, HenHanna via Python-list wrote:
>> On 5/30/2024 2:18 PM, dn wrote:
>>> On 31/05/24 08:03, HenHanna via Python-list wrote:
>>>>
>>>> Given a text file of a novel (JoyceUlysses.txt) ...
>>>>
>>>> could someone give me a pretty fast (and simple) Python program
>>>> that'd give me a list of all words occurring exactly once?
>>>>
>>>> -- Also, a list of words occurring once, twice or 3
>>>> times
>>>>
>>>>
>>>>
>>>> re: hyphenated words (you can treat it anyway you like)
>>>>
>>>> but ideally, i'd treat [editor-in-chief]
>>>> [go-ahead] [pen-knife]
>>>> [know-how] [far-fetched] ...
>>>> as one unit.
>>
>>
>>>
>>> Split into words - defined as you will.
>>> Use Counter.
>>>
>>> Show some (of your) code and we'll be happy to critique...
>>
>>
>> hard to decide what to do with hyphens
>> and apostrophes
>> (I'd, he's, can't, haven't, A's and B's)
>>
>>
>> 2-step-Process
>>
>> 1. make a file listing all words (one word per line)
>>
>> 2. then, doing the counting. using
>> from collections import Counter
>
>
> Apologies for lateness - only just able to come back to this.
>
> This issue is not Python, and is not solved by code!
>
> If you/your teacher can't define a "word", the code, any code, will
> almost-certainly be wrong!
>
>
> One of the interesting aspects of our work is that we can write all
> manner of tests to try to ensure that the code is correct: unit tests,
> integration tests, system tests, acceptance tests, eye-tests, ...
>
> However, there is no such thing as a test (or proof) that statements of
> requirements are complete or correct!
> (nor for any other previous stages of the full project life-cycle)
>
> As coders we need to learn to require clear specifications and not
> attempt to read-between-the-lines, use our initiative, or otherwise 'not
> bother the ...'. When there is ambiguity, we should go back to the
> user/client/boss and seek clarification. They are the
> domain/subject-matter experts...
>
> I'm reminded of a cartoon, possibly from some IBM source, first seen in
> black-and-white but here in living-color:
> https://www.monolithic.org/blogs/presidents-sphere/what-the-customer-really-wants
That one's been kicking around for years ... good job in finding a link
for it!
> That has been the sad history of programming and dev.projects - wherein
> we are blamed for every short-coming, because no-one else understands
> the nuances of development projects.
Of course, we see this lack of clarity all the time in questions to the
list. I often wonder how these askers can possibly come up with
acceptable code if they don't realize they don't truly know what it's
supposed to do.
> If we don't insist on clarity, are we our own worst enemy?
>
>
More information about the Python-list
mailing list