Peer review: Python encyphering script

Wed Feb 27 23:46:31 EST 2002

In article <3c7c1a91.150875561 at menuhin.netfront.net>, A. Jones wrote:
> It's my second day Pythoning and I've written this code to encypher
> text (or files, ostensibly) into two pieces, neither of which can be
> used to determine the message.
> 
> Any problems pointed out, clunky lines spotted, or security problems
> would be most appreciated.  Also, if you've seen something like this,
> if you'd tell me, please.
 [...]

Okay; A code review follows.  Not bad for two days of Pythoning!

BTW, I would strongly disrecommend using this cipher for real purposes;
it is not as secure as you think.  See my closing comments below.

===================================================================

> #encyphering tool 
> plaintext = "Isn't it awfully nice to have a penis; isn't it
> frightfully good to have it on!" #The variable makes it obvious.

#Actually, this won't work: you need to use triple quotes for a string
#to span multiple lines.  Try it like this:

plaintext = '''I would have used the "Bruces' Philosophers Song" myself,
but examples are a matter of taste.  BTW, the final words are
'a dong'.'''

> blarg = [] #To hold translation of ASCII to num
> for b in plaintext:
> #    print b #Was here for development purposes.
>     blarg.append(ord(b)) #Turn it into a list of ASCII numbers
>                          #nums make sorting easier&append/ord makes a
> nice list

#Naming your variables 'blarg', 'blurb', and so forth can make your
#code harder to read.

plaintext = map(ord, plaintext)

> plaintext = "Text purged."

#It won't do any good to 'purge' the plaintext from memory at this point;
#if you're worried about an attacker getting it from RAM, they've already
#had an opportunity to have done so.  If you're worried about them reading
#from swap, your program may have already swapped out.

> blurb = {} #To hold 1:1 value/offset references
> b = 0 #reuse variable so's to destroy evidence/make my life easier

Rather than keeping an index, you might want to use the 'range' or
'xrange' method.

> for z in blarg:
>     if z in blurb: #duplicate key fixer
>             if type(blurb[z]) == "<type 'list'>": #Stops nesting
> strings

#It's bad style to rely in the string value of a type.  Intstead, you
#might want to import types, then say 
#          'if type(blurb[z]) == types.ListType:'.  
#If you're in Python 2.2, you can say, 
#          'if type(blurb[z]) == list:'

>                 blurb[z].append(b)
>             else:
>                 temp = blurb[z] #Used to create first list entry
>                 temp.append(b)
>                 blurb[z] = temp

#The two cases above do exactly the same thing; there's no need for the
# type check.

>     else: blurb[z] = [b] #sets dictionary with values as keys, offsets
> as objects
>     b = b+1 #Next step in the offset counter... clunky.

# Map from each character to an in-order list of its positions in
# the original map.
posMap = {}
for idx in xrange(len(plaintext)):
   character = plaintext[idx]
   if character in posMap:
      posMap[character].append(idx)
   else:
      posMap[character] = [idx]

> valuelist = blurb.keys() #extract keys of dictionary
> valuelist.sort() #to be randomized later(date)

# These lines are reasonable, but see below for thoughts on security.
valuelist = posMap.keys()
valuelist.sort()

> finval = [] #to hold final value list
> finoff = [] #to hold fial offset list
> for r in valuelist:
>     z = 0
>     while z < len(blurb[r]): #Writes number of values equal to number
> of offsets
>         finval.append(r)
>         z = z+1
>     for q in blurb[r]: #Decomposes my clunky nesting lists/dicts
>         finoff.append(q)
> 
> blurb = {}  #Destroy the evidence!
> valuelist = []

# If we use a nested loop, we can eliminate the index variable.

finval = []
finoff = []

for val in valuelist:
    for pos in posMap[val]:
        finval.append(val)
        finoff.append(pos)

# It's too late to destroy the evidence, but if you'd like, del works
# just fine.
del posMap, valuelist, plaintext

============================================================
To summarize, and introduce a function:

def encipher(plaintext):
    plaintext = map(ord, plaintext)

    posMap = {}
    for idx in xrange(len(plaintext)):
        character = plaintext[idx]
        if posMap.has_hey(character):
            posMap[character].append(idx)
        else:
            posMap[character] = [idx]

    valuelist = posMap.keys()
    valuelist.sort()

    finval = []
    finoff = []

    for val in valuelist:
        for pos in posMap[val]:
            finval.append(val)
            finoff.append(pos)

    return (finval, finoff)

> 
> --
> #decyphering tool
> text = [] #finval
> key = [] #finoff
> numdict = {}
> z = 0
> for r in key: #This here is the heart of the decryption process.
> Simply coalates the lists.
>     numdict[r] = text[z]
>     z = z + 1
> readlist = numdict.keys()
> readlist.sort() #puts the keys/offsets in order
> plaintext = ""
> for r in readlist:
>     plaintext = plaintext + chr(numdict[r])
> print plaintext #prints plaintext

#This is my first attempt, an the one that changes your approach least:

def decipher(text, key):
    numdict = {}

    # It's easiest to use range when you want a list of indices
    for idx in xrange(len(key)):
       numdict[key[idx]] = text[idx]

    readlist = numdict.keys()
    readlist.sort()

    # Using "+" to build strings can be very slow; it needs to build
    # a new string every time you use it.

    plaintext = [ chr(numdict[r]) for r in readlist ]

    return "".join(plaintext)

# This is my second appempt...
def decipher(text, key):
    result = [None]*len(text)
    for val,pos in zip(text,key):
        result[pos] = val
    result = map(chr, result)
    return "".join(result)

# This is my third:
def decipher(text, key):
    temp = zip(key, text)
    temp.sort()
    return "".join(map(lambda x: chr(x[1]), 
                       temp))

============================================================

And now, some notes on security.  (I'm not trying to be harsh here,
and I know this was probably not written for serious use, but it's
important to understand that cryptography is a lot harder than it
looks, and I wouldn't want you to accidentally leak anything private
through trusting inadequate security.)

First of all, this cipher is a bit suspicious.  It has the undesirable
property that the frequency counts of the characters in the ciphertext
are the same as in the plaintext.  That is to say, if I know you're
sending either "attack at dawn" or "run for your lives", it's easy for
me to tell which one you sent.

Secondly, this cipher doesn't buy you a lot: The key is as long as the
plaintext.  If you have a secure place to store the key, you would
presumably use that place to store the plaintext as well.

Third, the scrambling step is more crucial than you think.  Without
scrambling, if I see the key-portion of a message, I'll have a pretty
good idea which parts of the message go where: indices for low letters
earlier than indices for high ones.  But with scrambling, the quality
of your cipher is limited by the power of your random number
generator. 

For a deeper introduction to cryptography, you may want to get a copy
of Schneier's _Applied Cryptography_, or read the FAQs from sci.crypt.

============================================================

Well, I hope this that didn't dampen your entusiasm for Python!
You're off to a good start (esp. for two days!), and I hope we'll hear
more of you in the future!

Welcome-to-comp-lang-python-ly Yrs,

-- 
 Nick Mathewson    <Q nick Q m at alum dot mit dot edu>
                      Remove Q's to respond.  No spam.