[Tutor] Letter Frequency Count.

Tesla Coil tescoil@rtpro.net
Tue, 11 Jan 2000 21:38:10 -0600


This is what I ended up doing altogether.  A 0.0667 index
of coincidence is statistically ideal English text; random
text is about 0.0375.  Would have been nice to print letter
occurences sorted from highest to lowest frequency, but
I wasn't sure how to go about that from here.  Percentages
might be Interesting--just haven't given that thought yet.

# idxcon.py calculate index of coincidence
# and return frequency of letter occurrence.

import sys, string
simpsons = string.uppercase
emperor = sys.stdin.read()
emperor = string.upper(emperor)
stringette = {}

for individual in emperor:
    if individual in simpsons:
        if stringette.has_key(individual):
            stringette[individual] = stringette[individual] + 1
        else:
            stringette[individual] = 1

wapcaplet = stringette.values()

def rectangle(x): return x*(x-1)
mousebat = map(rectangle, wapcaplet)

def sum(x, y): return x+y
goosecreature = reduce(sum, mousebat)
follicle = reduce(sum, wapcaplet)

spong=rectangle(follicle)

ic = float(goosecreature)/float(spong)

print "The index of coincidence is", round(ic, 4), "\n"

for ampersand in simpsons:
    if stringette.has_key(ampersand):
        print ampersand, ":", stringette[ampersand]
    else:
        print ampersand, ": 0"

print "total letters:", follicle