[Numpy-discussion] numpy.power -> numpy.random.choice Probabilities don't sum to 1

Andy Ray Terrel andy.terrel at gmail.com
Sat Dec 19 07:55:25 EST 2015


A simple fix would certainly by pass the check in random.choice, but I
don't know how to get that. So let's focus on the summation.

I believe you are hitting an instability in summing small numbers as a
power to 10th order would produce. Here is an example:

mymatrix = np.random.rand(1024,1024).astype('float16')*1e-7
row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True)
sums = row_normalized.sum(axis=1)
len(sums[sums != 1]) # ->  108

One can use things like Kahan summation and you will need to collect the
size of the error and truncate all numbers in mymatrix under that error.
I'm not quite sure how to quickly implement such a thing in numpy without a
loop.

On Fri, Dec 18, 2015 at 7:00 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Dec 18, 2015 at 1:25 PM, Ryan R. Rosario <ryan at bytemining.com>
> wrote:
> > Hi,
> >
> > I have a matrix whose entries I must raise to a certain power and then
> normalize by row. After I do that, when I pass some rows to
> numpy.random.choice, I get a ValueError: probabilities do not sum to 1.
> >
> > I understand that floating point is not perfect, and my matrix is so
> large that I cannot use np.longdouble because I will run out of RAM.
> >
> > As an example on a smaller matrix:
> >
> > np.power(mymatrix, 10, out=mymatrix)
> > row_normalized = np.apply_along_axis(lambda x: x / np.sum(x), 1,
> mymatrix)
>
> I'm sorry I don't have a solution to your actual problem off the top
> of my head, but it's probably helpful in general to know that a better
> way to write this would be just
>
>   row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True)
>
> apply_along_axis is slow and can almost always be replaced by a
> broadcasting expression like this.
>
> > sums = row_normalized.sum(axis=1)
> > sums[np.where(sums != 1)]
>
> And here you can just write
>
>   sums[sums != 1]
>
> i.e. the call to where() isn't doing anything useful.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151219/e1c9310d/attachment.html>


More information about the NumPy-Discussion mailing list