[Tutor] Individual Character Count
Scot W. Stevenson
scot@possum.in-berlin.de
Sun, 8 Sep 2002 02:10:31 +0200
Hello Kyle,
> I'm trying (with little luck) to create a function to count how many
> time an individual character appears in a file.
One thing I have learned in the past few months about Python is to always
consult the Module Library before even considering writing new code. This
is the "batteries included" principle that people keep talking about with
Python, and it works.
In this case, there is actually a "count" method in the string module. I'm
assuming you have a new version of Python such as 2.2, where you don't
have to import the string module anymore (if not, we'll try again with one
of the older forms), so you get:
===============================
>>> mystring = 'Spam! Spam! Spam!'
>>> mystring.count('S')
3
===============================
Or, even shorter, though it looks strange the first time you see it:
===============================
>>> 'Spam! Spam! Spam!'.count('S')
3
===============================
So the short version of your function could be:
===============================
def InCharCount(location, character):
subj = file(location, "r")
body = subj.read()
subj.close()
return body.count(character)
===============================
[I just love that last line: It sounds like something out of a Python
version of "Apocalypse Now". And I bet you didn't even see it coming.]
You don't really need the close(), because the Python Elves will do it for
you after the program is over, but it is considered good form because it
shows attention to detail and moral fiber. Note that count() will also
accept strings (such as 'word') and not only single characters ('w'), so
you get more fun for same price.
There is one problem with this version, though: read() gives you the whole
file as one big string. Usually, this should be fine, but if you import a
very, very large file (say, some DNA sequencing data from your secret
T-Rex project) on a very, very small machine, this might cause trouble.
So you might be better off reading the file line by line after all. You
could try this (in Python 2.2):
================================
def InCharCount(location, character):
subj = file(location, "r")
nbr_of_char = 0
for line in subj:
nbr_of_char = nbr_of_char + line.count(character)
return nbr_of_char
================================
The "for line in subj" goes thru the file one line at a time very quickly,
and you simply add up all the times the char occurs in each line. This
takes care of any memory problems you might have with large files, but
does take longer.
Hope this helps,
Y, Scot
--
Scot W. Stevenson wrote me on Sunday, 8. Sep 2002 in Zepernick, Germany
on his happy little Linux system that has been up for 1966 hours
and has a CPU that is falling asleep at a system load of 0.00.