[Tutor] Individual Character Count

Scot W. Stevenson scot@possum.in-berlin.de
Sun, 8 Sep 2002 02:10:31 +0200


Hello Kyle, 

> I'm trying (with little luck) to create a function to count how many
> time an individual character appears in a file.  

One thing I have learned in the past few months about Python is to always 
consult the Module Library before even considering writing new code. This 
is the "batteries included" principle that people keep talking about with 
Python, and it works. 

In this case, there is actually a "count" method in the string module. I'm 
assuming you have a new version of Python such as 2.2, where you don't 
have to import the string module anymore (if not, we'll try again with one 
of the older forms), so you get:

===============================
>>> mystring = 'Spam! Spam! Spam!'
>>> mystring.count('S')
3
===============================

Or, even shorter, though it looks strange the first time you see it: 

===============================
>>> 'Spam! Spam! Spam!'.count('S')
3
===============================

So the short version of your function could be: 

===============================
def InCharCount(location, character):
    subj = file(location, "r")
    body = subj.read()
    subj.close()
    return body.count(character)
===============================

[I just love that last line: It sounds like something out of a Python 
version of "Apocalypse Now". And I bet you didn't even see it coming.]

You don't really need the close(), because the Python Elves will do it for 
you after the program is over, but it is considered good form because it 
shows attention to detail and moral fiber. Note that count() will also 
accept strings (such as 'word') and not only single characters ('w'), so 
you get more fun for same price. 

There is one problem with this version, though: read() gives you the whole 
file as one big string. Usually, this should be fine, but if you import a 
very, very large file (say, some DNA sequencing data from your secret  
T-Rex project) on a very, very small machine, this might cause trouble. 

So you might be better off reading the file line by line after all. You 
could try this (in Python 2.2):

================================
def InCharCount(location, character):
    subj = file(location, "r")

    nbr_of_char = 0
    for line in subj:
        nbr_of_char = nbr_of_char + line.count(character)

    return nbr_of_char
================================

The "for line in subj" goes thru the file one line at a time very quickly, 
and you simply add up all the times the char occurs in each line. This 
takes care of any memory problems you might have with large files, but 
does take longer.

Hope this helps, 
Y, Scot

-- 
   Scot W. Stevenson wrote me on Sunday, 8. Sep 2002 in Zepernick, Germany    
       on his happy little Linux system that has been up for 1966 hours       
        and has a CPU that is falling asleep at a system load of 0.00.