Random from Dictionary

Tobias Klausmann klausman-un0204 at schwarzvogel.de
Sun Feb 22 20:35:09 CET 2004

```Hi!

This thread has reminded me (and touched upon) a problem I had a
few months back and never got to solve in a way I really liked.

Here's the outline:
I'd like to generate words from atoms (like 'e', 'ch', 'th' and
so on). Now while the basic stupid algorithm is to just put them
into two different lists (roughly "vowels" and "consonants") and
the just pick alternatingly form both (or maybe a bit mor often
form one, that does not quite satisfy me.

What I'd like is having a dictionary or a list of tuples, which
contains the probability for each element and then an efficient
way of creating strings that contain a set number of random chars
according to the distribution outlined by the probablities.

For small numbers of elements, just creating a list that contains
more of the same element for higher probabilities and
then pick()ing from it works. Unfortunately, I need a quite high
precision (about 0.05% of the sum of probabilities) and have a
list of about 100 atoms. The list would become very large, I
guess (although I haven't tried).

Now I've come up with an idea that might solve my problem, but I
don't know if it's correct or at least effective.

I build the list the following way: For the first atom, I create
a dictionary entry with the key being its relative probability.
The second key gets an entry with its probability added to the
first one. This goes on until all atoms are in the list.

Now I generate a number between 0 and the highest number in the
dictionary, i.e. the key of thw most probable element. The
picked element will be the one with the smallest key that's over
the random number.

The main disadvantage I see is that I have to walk large parts of
the list for every pick. Of course, I would start at the top,
trying to minimize the number of elements needing to be checked.

Still, it doesn't quite satisfy me. While I have a strong belief
in its correctness, I'd be glad if others think so, too :)

Better yet, a faster and proven way to do this would be even
nicer.

Greets,
Tobias

```