From JoyceUlysses.txt -- words occurring exactly once
HenHanna
HenHanna at devnull.tb
Thu May 30 22:26:37 EDT 2024
On 5/30/2024 2:18 PM, dn wrote:
> On 31/05/24 08:03, HenHanna via Python-list wrote:
>>
>> Given a text file of a novel (JoyceUlysses.txt) ...
>>
>> could someone give me a pretty fast (and simple) Python program that'd
>> give me a list of all words occurring exactly once?
>>
>> -- Also, a list of words occurring once, twice or 3 times
>>
>>
>>
>> re: hyphenated words (you can treat it anyway you like)
>>
>> but ideally, i'd treat [editor-in-chief]
>> [go-ahead] [pen-knife]
>> [know-how] [far-fetched] ...
>> as one unit.
>
> Split into words - defined as you will.
> Use Counter.
>
> Show some (of your) code and we'll be happy to critique...
hard to decide what to do with hyphens
and apostrophes
(I'd, he's, can't, haven't, A's and B's)
2-step-Process
1. make a file listing all words (one word per line)
2. then, doing the counting. using
from collections import Counter
Related code (for 1) that i'd used before:
Rfile = open("JoyceUlysses.txt", 'r')
with open( 'Out.txt', 'w' ) as fo:
for line in Rfile:
line = line.rstrip()
wLis = line.split()
for w in wLis:
if w != "":
w = w.rstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
w = w.lstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
fo.write(w.lower())
fo.write('\n')
More information about the Python-list
mailing list