Consider the following:
from numpy.random import default_rng rs = default_rng()
Now how do I re-seed the generator? I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
Thanks, Neal
Just call
rs = default_rng()
Again.
On Wed, Jun 24, 2020, 20:31 Neal Becker ndbecker2@gmail.com wrote:
Consider the following:
from numpy.random import default_rng rs = default_rng()
Now how do I re-seed the generator? I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
Thanks, Neal
-- *Those who don't understand recursion are doomed to repeat it* _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker ndbecker2@gmail.com wrote:
Consider the following:
from numpy.random import default_rng rs = default_rng()
Now how do I re-seed the generator? I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?
The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
On Wed, Jun 24, 2020 at 6:40 PM Robert Kern robert.kern@gmail.com wrote:
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker ndbecker2@gmail.com wrote:
Consider the following:
from numpy.random import default_rng rs = default_rng()
Now how do I re-seed the generator? I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?
The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
(apologies for jumping into a conversation) So what is the recommendation for instantiating a number of generators with manually controlled seeds?
The use case is running a series of MC simulations with reproducible streams. The runs are independent and are run in parallel in separate OS processes, where I do not control the time each process starts (jobs are submitted to the batch queue), so default seeding seems dubious?
Previously, I would just do roughly
seeds = [1234, 1235, 1236, ...] rngs = [np.random.RandomState(seed) for seed in seeds] ... and each process operates with its own `rng`. What would be the recommended way with the new `Generator` framework? A human-friendly way would be preferable if possible.
Thanks,
Evgeni
On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard kevin.k.sheppard@gmail.com wrote:
If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use
gen = np.random.default_rng()
state = gen.bit_generator.state
gen.standard_normal()
# 0.5644742559549797, will vary across runs
gen.bit_generator.state = state
gen.standard_normal()
# Always the same as before 0.5644742559549797
The equivalent to the old way of calling seed to reseed is:
SEED = 918273645
gen = np.random.default_rng(SEED)
gen.standard_normal()
# 0.12345677
gen = np.random.default_rng(SEED)
gen.standard_normal()
# Identical value
Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.
Kevin
From: Neal Becker Sent: Monday, June 29, 2020 1:01 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] reseed random generator (1.19)
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
On Wed, Jun 24, 2020 at 6:40 PM Robert Kern robert.kern@gmail.com wrote:
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker ndbecker2@gmail.com wrote:
Consider the following:
from numpy.random import default_rng rs = default_rng()
Now how do I re-seed the generator?
I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?
The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
--
Robert Kern
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
--
Those who don't understand recursion are doomed to repeat it
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Thanks Kevin!
A possibly dumb follow-up question: in your example,
entropy = 382193877439745928479635728
is it relevant that `entropy` is a long integer? I.e., what are the constraints on its value, can one use
entropy = 1234 or entropy = 0 or entropy = 1
instead?
On Mon, Jun 29, 2020 at 5:37 PM Kevin Sheppard kevin.k.sheppard@gmail.com wrote:
The best practice is to use a SeedSequence to spawn child SeedSequences, and then to use these children to initialize your generators or bit generators.
from numpy.random import SeedSequence, Generator, PCG64, default_rng
entropy = 382193877439745928479635728
seed_seq = SeedSequence(entropy)
NUM_STREAMS = 2**15
children = seed_seq.spawn(NUM_STREAMS)
# if you want the current best bit generator, which may change
rngs = [default_rng(child) for child in children]
# If you want the most control across version, set the bit generator
# this uses PCG64, which is the current default. Each bit generator needs to be wrapped in a generator
rngs = [Generator(PCG64(child)) for child in children]
Kevin
From: Evgeni Burovski Sent: Monday, June 29, 2020 2:21 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] reseed random generator (1.19)
(apologies for jumping into a conversation)
So what is the recommendation for instantiating a number of generators
with manually controlled seeds?
The use case is running a series of MC simulations with reproducible
streams. The runs are independent and are run in parallel in separate
OS processes, where I do not control the time each process starts
(jobs are submitted to the batch queue), so default seeding seems
dubious?
Previously, I would just do roughly
seeds = [1234, 1235, 1236, ...]
rngs = [np.random.RandomState(seed) for seed in seeds]
...
and each process operates with its own `rng`.
What would be the recommended way with the new `Generator` framework?
A human-friendly way would be preferable if possible.
Thanks,
Evgeni
On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard
kevin.k.sheppard@gmail.com wrote:
If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use
gen = np.random.default_rng()
state = gen.bit_generator.state
gen.standard_normal()
# 0.5644742559549797, will vary across runs
gen.bit_generator.state = state
gen.standard_normal()
# Always the same as before 0.5644742559549797
The equivalent to the old way of calling seed to reseed is:
SEED = 918273645
gen = np.random.default_rng(SEED)
gen.standard_normal()
# 0.12345677
gen = np.random.default_rng(SEED)
gen.standard_normal()
# Identical value
Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.
Kevin
From: Neal Becker
Sent: Monday, June 29, 2020 1:01 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] reseed random generator (1.19)
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
On Wed, Jun 24, 2020 at 6:40 PM Robert Kern robert.kern@gmail.com wrote:
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker ndbecker2@gmail.com wrote:
Consider the following:
from numpy.random import default_rng
rs = default_rng()
Now how do I re-seed the generator?
I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?
The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
--
Robert Kern
NumPy-Discussion mailing list
NumPy-Discussion@python.org
--
Those who don't understand recursion are doomed to repeat it
NumPy-Discussion mailing list
NumPy-Discussion@python.org
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard kevin.k.sheppard@gmail.com wrote:
It can be anything, but “good practice” is to use a number that would have 2 properties:
- When expressed as binary number, it would have a large number of
both 0s and 1s
The properties of the SeedSequence algorithm render this irrelevant, fortunately. While there are seed numbers that might create "bad" outputs from SeedSequence with overly low or high Hamming weight (number of 1s), they are scattered around the input space so you have to adversarially reverse the SeedSequence algorithm to find them. IMO, the only reason to avoid seed numbers like this has more to do with the fact that there are a relatively small number of these seeds. If you are deliberately picking from that small set somehow, it's more likely that other researchers are too, and you are more likely to reuse that same seed.
- The total number of digits in the binary representation is
somewhere between 32 and 128.
I like using the standard library `secrets` module.
import secrets secrets.randbelow(1<<128)
8080125189471896523368405732926911908
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably.
We have thought about wrapping that up in a numpy.random function (e.g. `np.random.simple_seed()` or something like that) for convenience, but we wanted to wait a bit before commiting to an API.
On Mon, Jun 29, 2020 at 11:30 AM Robert Kern robert.kern@gmail.com wrote:
On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard < kevin.k.sheppard@gmail.com> wrote:
- The total number of digits in the binary representation is
somewhere between 32 and 128.
I like using the standard library `secrets` module.
import secrets secrets.randbelow(1<<128)
8080125189471896523368405732926911908
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably.
Sorry, `secrets.randbits(128)` is the cleaner form of this.
Thanks Kevin, thanks Robert, this is very helpful!
I'd strongly agree with Matti that your explanations could/should make it to the docs. Maybe it's something for the GSoD.
While we're on the subject, one comment and two (hopefully last) questions:
1. My two cents w.r.t. `np.random.simple_seed()` function Robert mentioned: I personally would find it way more confusing than a clear explanation + example in the docs. I'd ask myself what's "simple" here, click through to the source of this `simple_seed`, find out that it's a docsting and a two-liner, and just copy-paste the latter into my user code. Again, just FWIW.
2. What would be a preferred way of spelling out "give me the N-th spawned child SeedSequence"? The use case is that I prepare (human-readable) input files once and run a number of computational jobs in separate OS processes. From what Kevin said, I can of course five each worker a pair of (entropy, worker_id) and then each of them does at startup
parent_seq = SeedSequence(entropy) this_sequence = seed_seq.spawn(worker_id)[worker_id]
Is this a recommended way, or is there a better API? Or does the number of spawned children need to be known beforehand? I'd much rather avoid serialization/deserialization if possible.
3. Is there a way of telling the number of draws a generator did?
The use case is to checkpoint the number of draws and `.advance` the bit generator when resuming from the checkpoint. (The runs are longer then the batch queue limits).
Thanks!
Evgeni
On Mon, Jun 29, 2020 at 11:06 PM Robert Kern robert.kern@gmail.com wrote:
On Mon, Jun 29, 2020 at 11:30 AM Robert Kern robert.kern@gmail.com wrote:
On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard kevin.k.sheppard@gmail.com wrote:
The total number of digits in the binary representation is somewhere between 32 and 128.
I like using the standard library `secrets` module.
import secrets secrets.randbelow(1<<128)
8080125189471896523368405732926911908
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably.
Sorry, `secrets.randbits(128)` is the cleaner form of this.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski evgeny.burovskiy@gmail.com wrote:
Thanks Kevin, thanks Robert, this is very helpful!
I'd strongly agree with Matti that your explanations could/should make it to the docs. Maybe it's something for the GSoD.
While we're on the subject, one comment and two (hopefully last) questions:
- My two cents w.r.t. `np.random.simple_seed()` function Robert
mentioned: I personally would find it way more confusing than a clear explanation + example in the docs. I'd ask myself what's "simple" here, click through to the source of this `simple_seed`, find out that it's a docsting and a two-liner, and just copy-paste the latter into my user code. Again, just FWIW.
Noted.
- What would be a preferred way of spelling out "give me the N-th
spawned child SeedSequence"? The use case is that I prepare (human-readable) input files once and run a number of computational jobs in separate OS processes. From what Kevin said, I can of course five each worker a pair of (entropy, worker_id) and then each of them does at startup
parent_seq = SeedSequence(entropy) this_sequence = seed_seq.spawn(worker_id)[worker_id]
Is this a recommended way, or is there a better API? Or does the number of spawned children need to be known beforehand? I'd much rather avoid serialization/deserialization if possible.
Assuming that `worker_id` starts at 0:
this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))
- Is there a way of telling the number of draws a generator did?
The use case is to checkpoint the number of draws and `.advance` the bit generator when resuming from the checkpoint. (The runs are longer then the batch queue limits).
There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance.
On Sat, Jul 4, 2020 at 1:56 PM Robert Kern robert.kern@gmail.com wrote: ....
- Is there a way of telling the number of draws a generator did?
The use case is to checkpoint the number of draws and `.advance` the bit generator when resuming from the checkpoint. (The runs are longer then the batch queue limits).
There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance.
-- Robert Kern
Sorry, you lost me here. If I want to save, restore the state of a generator, can I use pickle/unpickle?
On Sat, Jul 4, 2020, 2:39 PM Neal Becker ndbecker2@gmail.com wrote:
On Sat, Jul 4, 2020 at 1:56 PM Robert Kern robert.kern@gmail.com wrote: ....
- Is there a way of telling the number of draws a generator did?
The use case is to checkpoint the number of draws and `.advance` the bit generator when resuming from the checkpoint. (The runs are longer then the batch queue limits).
There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance.
-- Robert Kern
Sorry, you lost me here. If I want to save, restore the state of a generator, can I use pickle/unpickle?
Absolutely.
On Mon, Jun 29, 2020 at 8:02 AM Neal Becker ndbecker2@gmail.com wrote:
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
In general, you should just pass in a new Generator that was created with the same seed.
def function_to_test(rg): x = rg.standard_normal() ...
SEED = 12345...
rg = np.random.default_rng(SEED) function_to_test(rg) rg = npp.random.default_rng(SEED) function_to_test(rg)
Resetting the state of the underlying BitGenerator in-place is possible, as Kevin showed, but if you can refactor your code so that there isn't a persistent Generator object between these runs, that's probably better. It's a code smell if you can't just pass in a fresh Generator; in general, it means that your code is harder to use, not just because we don't expose an in-place seed() method.