HELP Newbie solve this Problem

Sun Mar 26 13:23:47 EST 2000

Subject: Re: HELP Newbie solve this Problem

>> Mark Hathaway wrote:
>>
>> I'm probably giving away somebody's homework,
>> but I can't allow that stupid Fartran to stand
>> without comparison to Python.

> Andrew Dalke <dalke at acm.org> wrote:
>...
> But your code has a bug.  Suppose a letter is not
> in the document.  Let's assume 'x' doesn't exist,
> and you have lookfor = lowercase + uppercase.  I
> would like to see, "x 0", but your code does:

Oh, is that a bug. I thought the requirements called
for reporting the occurrence of letters IN the text.
Maybe if I hadn't added the "lookfor" feature you wouldn't
have anything to complain about.  :-)

> And there is another bug with how you do case folding.
> Again from that code snippet, you are folding the
> character in the "lookfor" string, which was passed in
> from the parameter.  Don't you need to add the values
> for the lowercase and uppercase elements of the charDict?

You have found a bug. I guess I wasn't being paid enough
to test it properly. But, I don't want to add the upper
& lower in the charDict. I might want to report out according
to case, so they must remain distinct there. I've changed
it to allow reporting with ignoreCase='true'. The code is
even clearer now, though a couple of lines longer. Though,
I somehow suspect it won't suit you.  :-(

> Plus, you have a class design which basically fakes
> global memory.

"fakes global memory", what the heck are you griping
about now?

> IMO, it should look like:
>
> stats = find_char_counts(open("data"))
> print_char_counts(stats)

Not ver OOish, is it?

> where "find_char_counts" is like your "getText" method,
> the value of stats is identical to charDict, and
> "print_char_counts" is your "reportCounts" method.

Sounds like you've just got a problem with the names
I chose. You've just got problems.

> This is easier to understand (again, IMO) because you
> have a "thingy" for parsing, a "thingy" with data, and
> a "thingy" for output.  Then the I/O parts can be changed
> without having to modify the classes (eg, if you want
> to transpose the two columns, just write a new output,
> or if you want to generate random character values for
> input, just write to the data structure.

I don't recall any of this stuff in the requirements.
Are you just making this up as you go along?

Well, as I've written it there's a thingie, and
it can get text and it can report the frequencies
of characters (in the text). If you'd want it to
report a list, which might be manipulated by some
other function (to change print format or whatever)
you could do that. I've got no problem with that.
Do whtaever you want. My greater concern was with
the getText method always reading from a file. That's
a little rigid. But, that's the way I understood
the requirements (or lack thereof), so that's what
I wrote.

> BTW, there's a nice idiom in Python to implement the way
> you do getText but without the O(n) lookup you have
> to see if the character has been initialized:
>
>   for char in self.text:
>      self.charText[char] = self.charText.get(char, 0) + 1

It's not really self-explanatory, is it? I try to stay
away from cutsie stuff like that. Perhaps if speed were
very critical then I'd want to use it; not until.

Mark Hathaway
e-mail: hathawa2 at marshall.edu

------------------------ cut here --------------------------
class charFreq:

    # Written by: Mark Hathaway
    # Version: 1.000001
    # Date: 2000.03.26

    '''Usage:
       from modulename import charFreq
       obj = charFreq()
       obj.getText("filename.ext")
       obj.reportCounts("stringOfCharsToLookFor",ignoreCase="true")'''

    def getText (self, filename):
        self.text = open(filename).read()
        self.charDict = {}
        self.keys = []
        for char in self.text:
            if char not in self.keys:
                self.charDict[char] = 1
                self.keys.append(char)
            else:
                self.charDict[char] = self.charDict[char] + 1

    def reportCounts (self,lookfor="abcdefghijklmnopqrstuvwxyz"+
                                   "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
                      ignoreCase=None):
        from string import lower, upper
        self.keys.sort()
        for char in lookfor:
            count = 0
            if ignoreCase:
                charLower = lower(char)
                charUpper = upper(char)
                if charLower in self.keys:
                    count = count + self.charDict[charLower]
                if charUpper in self.keys:
                    count = count + self.charDict[charUpper]
            else:
                if char in self.keys:
                    count = self.charDict[char]
            print char, count

#end charFreq