Python for Reverse Engineering

Mike Meyer mwm at
Sat Nov 6 04:39:10 CET 2004

Brad Tilley <rtilley at> writes:

> A friend of mine wrote an algorithm that generates strings. He says
> that it's impossible to figure out exactly how the algorithm
> works. That may be true, but I think if I had enough sample strings
> that I could write a program to identify patterns in the strings and
> then produce strings with similar patterns. He disagrees with me. He
> gave me 100 strings to analyze.
> I wrote a script to read each string and build a list of characters
> present. I've discovered that he only uses 20 alpha/numeric chars to
> generate the strings and that each string sums up to a value between
> 1000 and 1200 when I assign each char in the string its ASCII value.
> What else might I do to analyze these things? As it stands now, I can
> generate an acceptable string on ever 100th attempt or so, but I'd
> like to do better than this.

Build a markov chain generator. Basically, build tuples of characters
of length N, and use those to index a dictionary containing a list (or
string) of characters that follow that list of characters in your
input strings. So, with N equal 2, and d[('a', 'b')] = ['c', 'd', 'e']
you'd know that the strings abc, abd and abe appeared in the text. To
gennerate a string, pull a random key from the dictionary, then choose
random characters from the list that the last N characters treated as
a tuple index in the dictionary. Larger values of N generate strings
that more closely resemble the input text.

Optionally, make the objects stored in the dictionary keep a count of
how many times each character appears, and choose one randomly
weighted by those counts.

I've got a markov chain generator that works on words if you really
want to see one.

Mike Meyer <mwm at>
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

More information about the Python-list mailing list