[Tutor] Individual Character Count

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Sat, 7 Sep 2002 16:17:20 -0700 (PDT)


On Sat, 7 Sep 2002, Kyle Babich wrote:

> I'm trying (with little luck) to create a function to count how many
> time an individual character appears in a file.  What I have so far I
> have written on patterns I've noticed but it is still extremely buggy
> depending on whether the character being searched for appears as the
> first character in the file, the last, both, or neither.  Here it is:

Hi Kyle,


Let's take a look at the program.

> ####################
> def InCharCount(location, character):
>     subj = file(location, "r")
>     body = subj.read()
>
>     body = body.split("\n")
>     body = string.join(body, "")
>     body = body.split(character)

Hmmm...  Ok, so if our file's contents has something like "aaabbabba",
then if we wanted to count all the "a"'s, we could split against 'a' and
see how many pieces come up:

###
>>> s = "aaabbabba"
>>> l = s.split('a')
>>> l
['', '', '', 'bb', 'bb', '']
###

As a result, our list won't contain any more "a"'s once we split by 'a'.
But it will have a bunch of empty strings, which might look silly.

... But those empty strings are there for a very good reason: we should be
able to rehydrate our string, and reverse the process by using join():

###
>>> 'a'.join(l)
'aaabbabba'
###


Your approach seems reasonable.  Once you've split the 'body' up, you
already have enough information to count how many of that 'character' is
in there: the number of 'character's should just be the number of
in-betweens we have in our split-up body:

   ['',       '',        '',           'bb',        'bb',        '']

That is, if we have a list of six elements, the number of "in-between"
places is just the number of commas we see in our list: 5.

join ==>

   '' + 'a' + '' + 'a' + '' + a + '' + 'bb' + 'a' + 'bb' + 'a' + ''

So all you need now is to count "in-between" places.  I'll let you figure
out how to do that.  *grin*


Your approach works even if our string is initially empty:

###
>>> mystring = ""
>>> l = mystring.split("a")
>>> l
['']
>>> "a".join(l)
''
###

because when our split-up list is only one element long, there are no
"in-between" spots in our list, which is exactly right.



Let's look at the rest of your code:

>     last = len(body)
>     last = last - 1
>
>     char = 0
>     for each in body:
>         char = char + 1
>
>     if body[0] in [""]:
>         char = char - 1
>
>     elif body[last] in [""]:
>         char = char - 1
>
>     else:
>         pass
>
>     return char

I'd remove this part of the code because it's making the problem too
complicated.  *grin* I don't think you need to do more cases based on
empty strings.  Empty strings in your split-up list are perfectly good:
you don't need to do additional case analysis on them.


Good luck to you!