
How can we use numpy's random `integers` function to get uniformly selected integers from an arbitrarily large `high` limit? This is important when dealing with exact probabilities in combinatorially large solution spaces. I propose that we add the capability for `integers` to construct arrays of type object_ by having it construct python int's as the objects in the returned array. This would allow arbitrarily large integers. The Python random library's `randrange` constructs values for arbitrary upper limits -- and they are exact when using subclasses of `random.Random` with a `getrandbits` methods (which includes the default rng for most operating systems). Numpy's random `integers` function rightfully raises on `integers(20**20, dtype=int64)` because the upper limit is above what can be held in an `int64`. But Python `int` objects store arbitrarily large integers. So I would expect `integers(20**20, dtype=object)` to create random integers on the desired range. Instead a TypeError is raised `Unsupported dtype dtype('O') for integers`. It seems we could provide support for dtype('O') by constructing Python `int` values and this would allow arbitrarily large ranges of integers. The core of this functionality would be close to the seven lines used in [the code of random.Random._randbelow](https://github.com/python/cpython/blob/eb953d6e4484339067837020f77eecac61f8d...) which 1) finds the number of bits needed to describe the `high` argument. 2) generates that number of random bits. 3) converts them to a python int and checks if it is larger than the input `high`. If so, repeat from step 2. I realize that people can just use `random.randrange` to obtain this functionality, but that doesn't return an array, and uses a different RNG possibly requiring tracking two RNG states. This text was also used to create [Issue #24458](https://github.com/numpy/numpy/issues/24458)

The easiest way to do this would to to write a pure python implementation using Python ints of a masked integer sampler. This way you could draw unsigned integers and then treat this as a bit pool. You would than take the number of bits needed for your integer, transform these to be a Python int, and finally apply the mask. This is how integers are generated in the legacy Random state code. Kevin On Sat, Aug 19, 2023, 15:43 Dan Schult <dschult@colgate.edu> wrote:
How can we use numpy's random `integers` function to get uniformly selected integers from an arbitrarily large `high` limit? This is important when dealing with exact probabilities in combinatorially large solution spaces.
I propose that we add the capability for `integers` to construct arrays of type object_ by having it construct python int's as the objects in the returned array. This would allow arbitrarily large integers.
The Python random library's `randrange` constructs values for arbitrary upper limits -- and they are exact when using subclasses of `random.Random` with a `getrandbits` methods (which includes the default rng for most operating systems).
Numpy's random `integers` function rightfully raises on `integers(20**20, dtype=int64)` because the upper limit is above what can be held in an `int64`. But Python `int` objects store arbitrarily large integers. So I would expect `integers(20**20, dtype=object)` to create random integers on the desired range. Instead a TypeError is raised `Unsupported dtype dtype('O') for integers`. It seems we could provide support for dtype('O') by constructing Python `int` values and this would allow arbitrarily large ranges of integers.
The core of this functionality would be close to the seven lines used in [the code of random.Random._randbelow]( https://github.com/python/cpython/blob/eb953d6e4484339067837020f77eecac61f8d...) which 1) finds the number of bits needed to describe the `high` argument. 2) generates that number of random bits. 3) converts them to a python int and checks if it is larger than the input `high`. If so, repeat from step 2.
I realize that people can just use `random.randrange` to obtain this functionality, but that doesn't return an array, and uses a different RNG possibly requiring tracking two RNG states.
This text was also used to create [Issue #24458]( https://github.com/numpy/numpy/issues/24458) _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: kevin.k.sheppard@gmail.com

On Sat, Aug 19, 2023 at 10:49 AM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
The easiest way to do this would to to write a pure python implementation using Python ints of a masked integer sampler. This way you could draw unsigned integers and then treat this as a bit pool. You would than take the number of bits needed for your integer, transform these to be a Python int, and finally apply the mask.
Indeed, that's how `random.Random` does it. I've commented on the issue with an implementation that subclasses `random.Random` to use numpy PRNGs as the source of bits for maximum compatibility with `Random`. The given use case motivating this feature request is networkx, which manually wraps numpy PRNGs in a class that incompletely mimics the `Random` interface. A true subclass eliminates all of the remaining inconsistencies between the two. I'm inclined to leave it at that and not extend the `Generator` interface. https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 -- Robert Kern
participants (3)
-
Dan Schult
-
Kevin Sheppard
-
Robert Kern