Re: Edu-sig digest, Vol 1 #172 - 5 msgs
def permute(): """ Randomly permute the uppercase alphabet by choosing its letters pseudo-randomly """ alphalist = list(string.uppercase) newlist = [] for i in range(len(alphalist)): randchar = random.choice(alphalist) alphalist.remove(randchar) newlist.append(randchar) return newlist
def mkdict(): """ Pair uppercase alphabet with randomly permuted version of same """ tuples = zip(string.uppercase,permute()) codedict = {} for pair in tuples: codedict[pair[0]]=pair[1] return codedict
The above algorithms, if I'm reading them correctly, allow a character to be substituted by itself. I assume this is desirable behavior (since it increases the number of permutations), but perhaps someone could comment on how often self-substitution could be expected to happen, and how likely it might be that self-substitution would cause the "encrypted" result to be easier to human-decode. (Self-substitution of "e" seems more likely to ease decryption than self-substitution of "z," but I'm just guessing.) Of course, most Cryptoquoters assume that self-substitution is explicitly disallowed, so it might actually be considered a "feature" in this context... (I can think of a couple of ways to disallow self-substitution, but I think the Python is less interesting than the problem.) Dorothea -- Dorothea Salo Impressions Book and Journal Services, Inc. phone: (608) 244-6218 fax: (608) 244-7050 http://www.impressions.com
Dorothea Salo writes:
def permute(): """ Randomly permute the uppercase alphabet by choosing its letters pseudo-randomly """ alphalist = list(string.uppercase) newlist = [] for i in range(len(alphalist)): randchar = random.choice(alphalist) alphalist.remove(randchar) newlist.append(randchar) return newlist
def mkdict(): """ Pair uppercase alphabet with randomly permuted version of same """ tuples = zip(string.uppercase,permute()) codedict = {} for pair in tuples: codedict[pair[0]]=pair[1] return codedict
The above algorithms, if I'm reading them correctly, allow a character to be substituted by itself. I assume this is desirable behavior (since it increases the number of permutations), but perhaps someone could comment on how often self-substitution could be expected to happen, and how likely it might be that self-substitution would cause the "encrypted" result to be easier to human-decode. (Self-substitution of "e" seems more likely to ease decryption than self-substitution of "z," but I'm just guessing.)
Self-substitution happens at all with probability 1-(1/e). It might help human attackers in some circumstances, if they get extraordinarily lucky. But in general, there's no reason to assume that self-substitution leaks more information about the message. If you see that the most common letter in a cyphertext is "M", do you assume that "M" is "E"? If you see that the most common letter in a cyphertext is "E", do you assume that "E" is "E"? If your answer to the two questions is different, then you're implying that you think the cyphertext alphabet is not completely arbitrary. This might be a good assumption in dealing with substitution cyphers created by humans by hand (in the sense that people are terrible at generating random numbers), but it's probably not a good assumption in dealing with an automatically generated cypher. Remember that by forbidding self-substitution, you eliminate almost 2/3 of the possible permutations, so you more than _double_ the amount of information available about the decryption process. (OK, that's not quite fair, because you don't usually have to find the entire alphabet in order to recover the plaintext.) Why should you give any extra clues? I can rephrase this another way. Suppose I have a cyphertext "QTQRQ ZCMYO". So originally this is just saying: there is a phrase of two words of five letters each written with the Latin alphabet. The first, third, and fifth letters of the first word are identical to each other, and different from all other letters in the phrase. The second letter of the first word is unique (not repeated anywhere else in the phrase). So is the fourth letter of that word. So is each letter of the second word. Do you see that this is _all_ the information given by the cyphertext? Now specifying a non-self-substitution rule adds the following additional information, which was not conveyed in any way by the cyphertext: The first letter of the first word is not "Q". The second letter of that word is not "T". The fourth letter of that word is not "R". The first letter of the second word is not "Z". The second letter of that word is not "C". The third letter of that word is not "M". The fourth letter of that word is not "Y". The fifth letter of that word is not "O". There are potentially very significant hints! But there was absolutely no way to infer them directly from the original cyphertext, without reference to knowledge of any languages that can be written in the Latin alphabet. Another way to say this is that if you somehow have the vast file /usr/share/plaintexts/parsimonious-decryptions, which contains all possible parsimonious decryptions of every cyphertext :-), and you are trying to write a program to search this file and print out all possible parsimonious decryptions for a given cyphertext, the hints you get from the non-self-substitution rule will potentially make your program faster and make it print out fewer possible decryptions. -- Seth David Schoen <schoen@loyalty.org> | And do not say, I will study when I Temp. http://www.loyalty.org/~schoen/ | have leisure; for perhaps you will down: http://www.loyalty.org/ (CAF) | not have leisure. -- Pirke Avot 2:5
participants (2)
-
Dorothea Salo
-
Seth David Schoen