
On 2010-11-22, at 2:51 AM, Hagen Fürstenau wrote:
but this is bound to be inefficient as soon as the vector of probabilities gets large, especially if you want to draw multiple samples.
Have I overlooked something or should this be added?
I think you misunderstand the point of multinomial distributions. A sample from a multinomial is simply a sample from n i.i.d. categoricals, reported as the counts for each category in the N observations. It's very easy to recover the 'categorical' samples from a 'multinomial' sample.
import numpy as np a = np.random.multinomial(50, [.3, .3, .4]) b = np.zeros(50, dtype=int) upper = np.cumsum(a); lower = upper - a
for value in range(len(a)): b[lower[value]:upper[value]] = value # mix up the order, in-place, if you care about them not being sorted np.random.shuffle(b)
then b is a sample from the corresponding 'categorical' distribution.
David