[New-bugs-announce] [issue41131] Augment random.choices() with the alias method
Raymond Hettinger
report at bugs.python.org
Fri Jun 26 16:38:41 EDT 2020
New submission from Raymond Hettinger <raymond.hettinger at gmail.com>:
For n unequal weights and k selections, sample selection with the inverse-cdf method is O(k log₂ n). Using the alias method, it improves to O(k). The proportionally constants also favor the alias method so that if the set up times were the same, the alias method would always win (even when n=2).
However, the set up times are not the same. For the inverse-cdf method, set up is O(1) if cum_weights are given; otherwise, it is O(n) with a fast loop. The setup time for the alias method is also O(n) but is proportionally much slower.
So, there would need to be a method selection heuristic based on the best trade-off between setup time and sample selection time.
Both methods make k calls to random().
See: https://en.wikipedia.org/wiki/Alias_method
Notes on the attached draft implementation:
* Needs to add back the error checking code.
* Need a better method selection heuristic.
* The alias table K defaults to the original index
so that there is always a valid selection even
if there are small rounding errors.
* The condition for the aliasing loop is designed
to have an early-out when the remaining blocks
all have equal weights. Also, the loop condition
makes sure that the pops never fail even if there
are small rounding errors when partitioning
oversized bins or if the sum of weights isn't
exactly 1.0.
----------
assignee: rhettinger
components: Library (Lib)
files: choices_proposal.py
messages: 372441
nosy: mark.dickinson, rhettinger, tim.peters
priority: normal
severity: normal
status: open
title: Augment random.choices() with the alias method
type: performance
versions: Python 3.10
Added file: https://bugs.python.org/file49267/choices_proposal.py
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41131>
_______________________________________
More information about the New-bugs-announce
mailing list