(Newbie) Counting Instances ("Hits") with Regular Expressions

Dave Reed dreed at capital.edu
Sun Jun 23 11:49:15 EDT 2002


> From: baf at texas.net (Ben Fairbank)
> Newsgroups: comp.lang.python
> X-Newsreader: Forte Free Agent 1.21/32.243
> Organization: Giganews.Com - Premium News Outsourcing
> X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
> X-Abuse-Info: Please be sure to forward a copy of ALL headers
> X-Abuse-Info: Otherwise we will be unable to process your complaint properly
> Xref: news.baymountain.net comp.lang.python:169353
> Sender: python-list-admin at python.org
> Date: Sun, 23 Jun 2002 15:08:17 GMT
> 
> I am new both to Python and to regular expressions, which may account
> for my difficulty.  I must count the frequenies of certain words in
> files of moderate length (about150 k bytes).  I have been reading
> files and then using count(s,sub), which is fast and easy.  I now have
> to allow for punctuation and eliminate words within words, etc, and so
> am trying to use regular expressions instead of simple words as
> targets.  I do not, however, find a similarly easy to use count
> function in the re module.  Yet this is such common operation it must
> be there, or easy to implement.  What is the usual way of simply
> counting "hits" in the re module?  (And what have I missed in the
> documentation; where is this to be found?  I have looked through Lutz
> and Ascher)
> 
> Thanks for any help.
> 
> BAF
> -- 
> 
> 

There's probably an easier way to do this, but I think this will do
what you want. It will search the file infile.txt for all occurences
of "word" (w/o the quotes). The \s ensures that there is some sort of
space, newline, tab, etc. before and after "word" so it won't match
"words" or anything containing "word". The match.end() will start the
next search at the position after the previous matching search.

import re

infile = open('infile.txt', 'r')
text = infile.read()
word = re.compile('\sword\s')
match = word.search(text)
count = 0
while match != None:
    count = count + 1
    match = word.search(text, match.end())
print 'count', count

HTH,
Dave





More information about the Python-list mailing list