[Edu-sig] Pythonic chatter (more cryptology stuff)

Kirby Urner pdx4d@teleport.com
Mon, 16 Jul 2001 11:15:02 -0700


Finally discovered after all this time the __invert__
overrides the tilde (~).  I've decided to use that with
permutations, which in one form are for letter substitutions
(hence the encrypt method):

   >>> from ciphers import P  # P is a permutation class
   >>> P.domain               # default class variable 1-10
   [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
   >>> import string
   >>> P.domain = list(string.letters + string.whitespace)
   >>> p = P()  # P() w/o arguments gives random permutation
   >>> c=p.encrypt("Simple 'clubhouse code' substitution")
   >>> c
   "B\tFzutj'AunmNUnltjAUft'jlnmlk\tknk\tUv"
   >>> print c  # what's a bit odd is to have TAB in the mix
   B	Fzutj'AunmNUnltjAUft'jlnmlk	knk	Uv
   >>> invp = ~p  # using the __invert__ operator
   >>> invp.encrypt("B\tFzutj'AunmNUnltjAUft'jlnmlk\tknk\tUv")
   "Simple 'clubhouse code' substitution"

Encrypting ciphertext with the inverse permutation is the
same as decrypting.  That explains the class method:

   def decrypt(self,ciphertext):
       other = ~self
       return other.encrypt(ciphertext)

   >>> newp.decrypt("B\tFzutj'AunmNUnltjAUft'jlnmlk\tknk\tUv")
   "Simple 'clubhouse code' substitution"

How is a permutation expressed, i.e. what's the __repr__
mechanism?  I've chosen the disjoint cycles method, e.g.
the above permutation would show up as:

   >>> p
   Permutation: [('z', 'g', 'V', 'x', 'L', 'D', ' ', 'j',
   'M', 'r', 'J', 'd', 'f', '\n', '\x0c', '\r', 'I', 'i',
   '\t', 'e', 't', 'k', 'T', 'b', 'm', 'F', 'O', 'R', 'C',
   's', 'l', 'u', 'n', 'v', '\x0b', 'Y', 'c', 'A', 'K', 'p'),
   ('y', 'o', 'U', 'S', 'B', 'W', 'H', 'P', 'a', 'Q', 'h',
   'N', 'w', 'E', 'X', 'G')]

This means 'z' becomes 'g', 'g' becomes 'V' and so on, up
to where 'p' wraps to the start of the tuple, and becomes
'z'.  You can see the 'z' in the 4th position, in
B\tFzutj, realizing the \t is one whitespace character:
TAB.  Notice that the single quote character doesn't
appear in the above and therefore maps to itself -- in this
case because I left punctuation characters out of the
domain altogther (in other cases, it might have been a
random self-association).  A more complete domain might
have been list(string.printable) -- or maybe:

   >>> from string import *
   >>> P.domain = list(letters + punctuation + " ")

Changing the class variable alters the domain for new
permutations, but the __init__ method internally promotes
the class-level domain to an instance level, so already-
defined objects don't forget the domain upon which they
were generated:

   >>> newp = P()  # using new class default for domain
   >>> len(newp.domain)
   85
   >>> len(p.domain)
   58

You can see the promotion in __init__:

    if domain != None:
         self.domain = domain
     else:
         self.domain = P.domain # initialize instance variable

Another override in the P class template might be less
advised than __invert__:  I decided to make ^ a synonym for
**, since so many languages express exponentiation with the
caret (in Python, it's default meaning is XOR).  Given
__pow__ was already defined, changing the meaning of ^ is
accomplished in a single line:

     __xor__ = __pow__ # use p^3 instead of p**3 if you like

Raising a permutation to an nth power is like applying it
n times, i.e. if 'v' becomes 'z' and 'z' becomes 'r', then
in a 2nd powering, 'v' goes straight to 'r'.  This is easier
to track with a smaller domain:

   >>> P.domain = range(1,11)
   >>> p1 = P()
   >>> p1
   Permutation: [(10, 7, 3, 6, 1, 4, 9, 5, 8)]
   >>> p1**2
   Permutation: [(10, 3, 1, 9, 8, 7, 6, 4, 5)]
   >>> p1**3
   Permutation: [(10, 6, 9), (8, 3, 4), (7, 1, 5)]

There's this not-so-hard-to-prove theorem that any
permutation can be raised to a power that's the lowest
common multiple of the lengths of all the cycles, which
will give the identity permutation as the result.
This power is called the "order of P" and it's easy
enough to implement.

     def ord(self):
         """
         returns the exponent n of p such that p**n
         = [] (identity element)
         """
         return reduce(lcm, [1]+map(len, self.mkcycles()))

self.mkcycles is the utility for converting an internal
substitution dictionary to cyclic format, i.e. is what's
used in __repr__ as well:

     def __repr__(self):
         return "Permutation: " + str(self.mkcycles())

lcm, on the other hand, lives outside the class (but is
still accessible in the module, even when we just import
P):

def gcd(a,b):
     """Return greatest common divisor using Euclid's Algorithm."""
     while b:
	a, b = b, a % b
     return a

def lcm(a,b):
     """
     Return lowest common multiple."""
     return (a*b)/gcd(a,b)

Euclid's algorithm, and this simple implementation (I think
originally by Guido, right?) is something I've been trying
to reinject into USAer K-12 math classrooms.  It's one of
the oldest algorithms on the books, and involves geometric
reasoning -- the kind of thing Arthur grooves on.  And yet
you'll find it's hardly ever mentioned in K-12 math (more
evidence of its quasi-total corruption).  Why make 'em wait
to college for this -- by then we've lost over 50% to non-
math-related majors, i.e. by then it's too late.

So what's the order of our permutation?

    >>> p1.ord()
    9
    >>> p1^9   # same as p1**9
    Permutation: []

The identity permutation, used for encryption, is pretty
useless, as it just gives you your message back:

   >>> P.domain = list(letters + punctuation + " ")
   >>> p2 = P()
   >>> p2.ord()
   13860
   >>> p3 = p2**13860  # takes a few secs on a 1.2 ghz machine
   >>> p3.encrypt("Able was I ere I saw Elba")
   'Able was I ere I saw Elba'

Doh!

But there's a good segue here:  suppose we encrypt by
using an nth power of the permutation, and then decrypt
by using an (order-n) permutation?  I.E.:

   >>> pe = p2**200
   >>> pe.encrypt("Able was I ere I saw Elba")
   'C>JDuMb\\u#uDrDu#u\\bMu)J>b'
   >>> pd = p2**(p2.ord()-200)  # power "the rest of the way"
   >>> pd.encrypt('C>JDuMb\\u#uDrDu#u\\bMu)J>b')
   'Able was I ere I saw Elba'

That's getting at how RSA works.  You encrypt m by raising
it to the 3rd power (say), modulo N, and then you raise the
"rest of the way" to get back m.  Figuring out what "the
rest of the way" is requires knowing the prime factors of
N, of which their are only two -- needles in a haystack the
size of Manhatten.

Kirby

More:
http://www.inetarena.com/~pdx4d/ocn/crypto0.html