From JoyceUlysses.txt -- words occurring exactly once

HenHanna HenHanna at devnull.tb
Thu May 30 22:26:37 EDT 2024


On 5/30/2024 2:18 PM, dn wrote:
> On 31/05/24 08:03, HenHanna via Python-list wrote:
>>
>> Given a text file of a novel (JoyceUlysses.txt) ...
>>
>> could someone give me a pretty fast (and simple) Python program that'd 
>> give me a list of all words occurring exactly once?
>>
>>                -- Also, a list of words occurring once, twice or 3 times
>>
>>
>>
>> re: hyphenated words        (you can treat it anyway you like)
>>
>>         but ideally, i'd treat  [editor-in-chief]
>>                                 [go-ahead]  [pen-knife]
>>                                 [know-how]  [far-fetched] ...
>>         as one unit.


> 
> Split into words - defined as you will.
> Use Counter.
> 
> Show some (of your) code and we'll be happy to critique...


hard to decide what to do with hyphens
                and apostrophes
              (I'd,  he's,  can't, haven't,  A's  and  B's)


2-step-Process

           1. make a file listing all words (one word per line)

           2.  then, doing the counting.  using
                               from collections import Counter


Related code  (for 1)  that i'd used before:

  Rfile  = open("JoyceUlysses.txt", 'r')

  with open( 'Out.txt', 'w' ) as fo:
     for line in Rfile:
         line = line.rstrip()
         wLis = line.split()
         for w in wLis:
             if w != "":
                 w = w.rstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
                 w = w.lstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
                 fo.write(w.lower())
                 fo.write('\n')



More information about the Python-list mailing list