From JoyceUlysses.txt -- words occurring exactly once

Thomas Passin list1 at tompassin.net
Fri May 31 17:27:00 EDT 2024


On 5/30/2024 4:03 PM, HenHanna via Python-list wrote:
> 
> Given a text file of a novel (JoyceUlysses.txt) ...
> 
> could someone give me a pretty fast (and simple) Python program that'd 
> give me a list of all words occurring exactly once?
> 
>                -- Also, a list of words occurring once, twice or 3 times
> 
> 
> 
> re: hyphenated words        (you can treat it anyway you like)
> 
>         but ideally, i'd treat  [editor-in-chief]
>                                 [go-ahead]  [pen-knife]
>                                 [know-how]  [far-fetched] ...
>         as one unit.

You will probably get a thousand different suggestions, but here's a 
fairly direct and readable one in Python:

s1 = 'Is this word is the only word repeated in this string'

counts = {}
for w in s1.lower().split():
     counts[w] = counts.get(w, 0) + 1
print(sorted(counts.items()))
# [('in', 1), ('is', 2), ('only', 1), ('repeated', 1), ('string', 1), 
('the', 1), ('this', 2), ('word', 2)]

Of course you can adjust the definition of what constitutes a word, 
handle punctuation and so on, and tinker with the output format to suit 
yourself.  You would replace s1.lower().split() with, e.g., 
my_custom_word_splitter(s1).




More information about the Python-list mailing list