On Mon, Nov 22, 2010 at 6:05 AM, Hagen Fürstenau hagen@zhuliguan.net wrote:

ISTM that this elementary functionality deserves an implementation that's as fast as it can be.

To substantiate this, I just wrote a simple implementation of "categorical" in "numpy/random/mtrand.pyx" and it's more than 8x faster than your version for multiple samples of the same distribution and more than 3x faster than using "multinomial(1, ...)" for multiple samples of different distributions (each time tested with 1000 samples drawn from distributions over 1000 categories).

I can provide it as a patch if there's any interest.

Can you compare the speed of your cython solution with the version of Chuck

-- For instance, weight 0..3 by 1..4, then

In [14]: w = arange(1,5)

In [15]: p = cumsum(w)/float(w.sum())

In [16]: bincount(p.searchsorted(random(1000000)))/1e6 Out[16]: array([ 0.100336, 0.200382, 0.299132, 0.40015 ])

------------- from numpy mailing list thread "Weighted random integers", sep. 10 Using searchsorted hat roughly a 10 times speedup compared to my multinomial version

Josef

- Hagen

NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion