[Tutor] Individual Character Count
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Sat, 7 Sep 2002 16:17:20 -0700 (PDT)
On Sat, 7 Sep 2002, Kyle Babich wrote:
> I'm trying (with little luck) to create a function to count how many
> time an individual character appears in a file. What I have so far I
> have written on patterns I've noticed but it is still extremely buggy
> depending on whether the character being searched for appears as the
> first character in the file, the last, both, or neither. Here it is:
Hi Kyle,
Let's take a look at the program.
> ####################
> def InCharCount(location, character):
> subj = file(location, "r")
> body = subj.read()
>
> body = body.split("\n")
> body = string.join(body, "")
> body = body.split(character)
Hmmm... Ok, so if our file's contents has something like "aaabbabba",
then if we wanted to count all the "a"'s, we could split against 'a' and
see how many pieces come up:
###
>>> s = "aaabbabba"
>>> l = s.split('a')
>>> l
['', '', '', 'bb', 'bb', '']
###
As a result, our list won't contain any more "a"'s once we split by 'a'.
But it will have a bunch of empty strings, which might look silly.
... But those empty strings are there for a very good reason: we should be
able to rehydrate our string, and reverse the process by using join():
###
>>> 'a'.join(l)
'aaabbabba'
###
Your approach seems reasonable. Once you've split the 'body' up, you
already have enough information to count how many of that 'character' is
in there: the number of 'character's should just be the number of
in-betweens we have in our split-up body:
['', '', '', 'bb', 'bb', '']
That is, if we have a list of six elements, the number of "in-between"
places is just the number of commas we see in our list: 5.
join ==>
'' + 'a' + '' + 'a' + '' + a + '' + 'bb' + 'a' + 'bb' + 'a' + ''
So all you need now is to count "in-between" places. I'll let you figure
out how to do that. *grin*
Your approach works even if our string is initially empty:
###
>>> mystring = ""
>>> l = mystring.split("a")
>>> l
['']
>>> "a".join(l)
''
###
because when our split-up list is only one element long, there are no
"in-between" spots in our list, which is exactly right.
Let's look at the rest of your code:
> last = len(body)
> last = last - 1
>
> char = 0
> for each in body:
> char = char + 1
>
> if body[0] in [""]:
> char = char - 1
>
> elif body[last] in [""]:
> char = char - 1
>
> else:
> pass
>
> return char
I'd remove this part of the code because it's making the problem too
complicated. *grin* I don't think you need to do more cases based on
empty strings. Empty strings in your split-up list are perfectly good:
you don't need to do additional case analysis on them.
Good luck to you!