[Python-ideas] [Python Ideas] Random weighted choice idea to add to random lib

Wed Mar 23 12:13:17 EDT 2016

On Mar 23, 2016, at 07:51, Steven Basart <xksteven at uchicago.edu> wrote:
> 
> Hello,
> 
> I had an idea that has in some form been proposed before.  I think a random weighted choice function would be a very good addition to the random module in the python standard library.
> 
> Here's a basic implementation of what's required albeit slightly inefficient as the search is linear as opposed to creating a binary tree and taking log n time to find the correct choice. pmf is short for probability mass function. It doesn't handle all error cases but it is more of an example to show the idea.
> 
> 
>     def randweighted(self,pmf=list):
>         """Return an int in the range [0,len(list)] weighted by list of values.  
>         Raises error if list is empty.
>         """
>         if(pmf == []):
>             raise ValueError("pmf is equal to empty list")
>         cummassf = [x[:i] for i in pmf]
>         randVal = self._randbelow(len(pmf))
>         for x in cummassf:
>             if randVal < x:
>                 return x

Why a default value of list, especially since that's just going to raise a TypeError later on?

Why the (unpythonic) test for "pmf == []" instead of just testing "if not pmf:"? Then it'll work on, say, tuples or third-party sorted lists or any other sequence. (With a tiny change, you can make it work on any iterable, but I don't know whether that's desirable.)

The x[:i] is accessing an unbound local named x. What's it supposed to be?

Whatever x is, each element of cummassf is going to be some kind of sequence, so randVal < x on each element is going to raise a TypeError. Maybe you wanted a sum somewhere?

If you fix that, doesn't this accumulate floating point errors, so there's a significant chance the result won't add up to 1.0 even if all of the weights are correct to 1 ULP?

If the weights _aren't_ correct, surely you'd want a ValueError or something. This code will just return None if the sum is too low, and silently choose from the first 1.0 if it's too high. (I think np.random.choice raises ValueError if not isclose(sum(p), 1.0), but I'd have to check.)

Finally, this is clearly a weighted randrange (with a fixed min of 0), but it claims to be a weighted choice. Obviously you can build the latter out of the former (e.g., take a val:weight mapping, call randweighted on the values, then do next(islice(keys, result, None)), but you do need to actually build it (and show what the API for it will be) if that's the motivating reason. (Also, I think an _unnormalized_ val:weight mapping might be a better API--after all, that means you can immediately call it on a Counter--but you probably want to look at what numpy, R, Mathematica, etc. do...)

> There was a ticket that got started before.  It has since stalled because I believe the original author intended the library to get called over an over so they came up with generators which would be more efficient if passing in the same list.  Otherwise there isn't really a point to using generators in my opinion.
> 
> http://bugs.python.org/issue18844

Why not just pick up from their original implementation, fix any bugs, and propose that, instead of starting over from scratch?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160323/4824d0c1/attachment.html>