[Tutor] Letter Frequency Count.
Tesla Coil
tescoil@rtpro.net
Tue, 11 Jan 2000 21:38:10 -0600
This is what I ended up doing altogether. A 0.0667 index
of coincidence is statistically ideal English text; random
text is about 0.0375. Would have been nice to print letter
occurences sorted from highest to lowest frequency, but
I wasn't sure how to go about that from here. Percentages
might be Interesting--just haven't given that thought yet.
# idxcon.py calculate index of coincidence
# and return frequency of letter occurrence.
import sys, string
simpsons = string.uppercase
emperor = sys.stdin.read()
emperor = string.upper(emperor)
stringette = {}
for individual in emperor:
if individual in simpsons:
if stringette.has_key(individual):
stringette[individual] = stringette[individual] + 1
else:
stringette[individual] = 1
wapcaplet = stringette.values()
def rectangle(x): return x*(x-1)
mousebat = map(rectangle, wapcaplet)
def sum(x, y): return x+y
goosecreature = reduce(sum, mousebat)
follicle = reduce(sum, wapcaplet)
spong=rectangle(follicle)
ic = float(goosecreature)/float(spong)
print "The index of coincidence is", round(ic, 4), "\n"
for ampersand in simpsons:
if stringette.has_key(ampersand):
print ampersand, ":", stringette[ampersand]
else:
print ampersand, ": 0"
print "total letters:", follicle