Carl Scharenberg carl.scharenberg at
Fri Sep 10 15:05:27 CEST 2004

Tris Orendorff <triso at> wrote in message news:<Xns955FCC8E99C10RepublicPicturesLtd at>...
> carl.scharenberg at (Carl Scharenberg) wrote in
> news:e930c085.0409020529.2db830fc at 
> >> This seems to be of somewhat better quality than the output of the
> >> typical random-text generator.  Can anyone suggest something on CPAN
> >> useful for such?
> > 
> > You can do this by analyzing a sample text at a higher level. Instead
> > of generating text from the frequency of single letters, you generate
> > using the frequencies of 2, 3, or 4-letter sequences. You analyze a
> > large text so you have a database of frequencies. When generating each
> > new character you look at the frequences of the letters given that the
> > 3 previous letters are 'the'. The possibilities are a space, 'r'
> > (their), 'y' (they), and some others. Overall it will generate words
> > and even phrases that seem to almost make sense. It is neat stuff.
> This is known as a Markov Chain and it works even better if you generate using words rather than letters.  
> Using letters creates words and non words.  The output is written in the same style as the input text.
> -- 
> Sincerely,
> Tris Orendorff
> Version: 3.12
> GCS d++ s+:- a+ C+ UL++++ P+ L+ E- W+ N++ o- K++ w+ O+ M !V PS+ PE Y+ PGP t+ !5 X- R- tv--- b++ 
> DI++ D+ G++ e++ h---- r+++ y+++
> ------END GEEK CODE BLOCK------

Oh yes, now that you mentioned them, I remember studying Markov chains
in linear algebra, I think. I've also heard this called a histogram or
ngram, but I've never looked into the terminology at all. I have never
done this at the word level, because I was simulating randomly-typing
monkeys when I played with this. And for the monkeys I wanted letter
generation. I will have to play with word generation too, because it
just never occurred to me!  :-)

More information about the Python-list mailing list