![](https://secure.gravatar.com/avatar/0b7d465c9e16b93623fd6926775b91eb.jpg?s=120&d=mm&r=g)
Consider the following: from numpy.random import default_rng rs = default_rng() Now how do I re-seed the generator? I thought perhaps rs.bit_generator.seed(), but there is no such attribute. Thanks, Neal -- *Those who don't understand recursion are doomed to repeat it*
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <ndbecker2@gmail.com> wrote:
In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances? The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number. -- Robert Kern
![](https://secure.gravatar.com/avatar/bd4477dc26bf9941268fbfa05abdeae6.jpg?s=120&d=mm&r=g)
(apologies for jumping into a conversation) So what is the recommendation for instantiating a number of generators with manually controlled seeds? The use case is running a series of MC simulations with reproducible streams. The runs are independent and are run in parallel in separate OS processes, where I do not control the time each process starts (jobs are submitted to the batch queue), so default seeding seems dubious? Previously, I would just do roughly seeds = [1234, 1235, 1236, ...] rngs = [np.random.RandomState(seed) for seed in seeds] ... and each process operates with its own `rng`. What would be the recommended way with the new `Generator` framework? A human-friendly way would be preferable if possible. Thanks, Evgeni On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/bd4477dc26bf9941268fbfa05abdeae6.jpg?s=120&d=mm&r=g)
Thanks Kevin! A possibly dumb follow-up question: in your example,
entropy = 382193877439745928479635728
is it relevant that `entropy` is a long integer? I.e., what are the constraints on its value, can one use entropy = 1234 or entropy = 0 or entropy = 1 instead? On Mon, Jun 29, 2020 at 5:37 PM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
The properties of the SeedSequence algorithm render this irrelevant, fortunately. While there are seed numbers that might create "bad" outputs from SeedSequence with overly low or high Hamming weight (number of 1s), they are scattered around the input space so you have to adversarially reverse the SeedSequence algorithm to find them. IMO, the only reason to avoid seed numbers like this has more to do with the fact that there are a relatively small number of these seeds. If you are deliberately picking from that small set somehow, it's more likely that other researchers are too, and you are more likely to reuse that same seed.
1. The total number of digits in the binary representation is somewhere between 32 and 128.
I like using the standard library `secrets` module.
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably. We have thought about wrapping that up in a numpy.random function (e.g. `np.random.simple_seed()` or something like that) for convenience, but we wanted to wait a bit before commiting to an API. -- Robert Kern
![](https://secure.gravatar.com/avatar/bd4477dc26bf9941268fbfa05abdeae6.jpg?s=120&d=mm&r=g)
Thanks Kevin, thanks Robert, this is very helpful! I'd strongly agree with Matti that your explanations could/should make it to the docs. Maybe it's something for the GSoD. While we're on the subject, one comment and two (hopefully last) questions: 1. My two cents w.r.t. `np.random.simple_seed()` function Robert mentioned: I personally would find it way more confusing than a clear explanation + example in the docs. I'd ask myself what's "simple" here, click through to the source of this `simple_seed`, find out that it's a docsting and a two-liner, and just copy-paste the latter into my user code. Again, just FWIW. 2. What would be a preferred way of spelling out "give me the N-th spawned child SeedSequence"? The use case is that I prepare (human-readable) input files once and run a number of computational jobs in separate OS processes. From what Kevin said, I can of course five each worker a pair of (entropy, worker_id) and then each of them does at startup
parent_seq = SeedSequence(entropy) this_sequence = seed_seq.spawn(worker_id)[worker_id]
Is this a recommended way, or is there a better API? Or does the number of spawned children need to be known beforehand? I'd much rather avoid serialization/deserialization if possible. 3. Is there a way of telling the number of draws a generator did? The use case is to checkpoint the number of draws and `.advance` the bit generator when resuming from the checkpoint. (The runs are longer then the batch queue limits). Thanks! Evgeni On Mon, Jun 29, 2020 at 11:06 PM Robert Kern <robert.kern@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
Noted.
Assuming that `worker_id` starts at 0: this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))
There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance. -- Robert Kern
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Mon, Jun 29, 2020 at 8:02 AM Neal Becker <ndbecker2@gmail.com> wrote:
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
In general, you should just pass in a new Generator that was created with the same seed. def function_to_test(rg): x = rg.standard_normal() ... SEED = 12345... rg = np.random.default_rng(SEED) function_to_test(rg) rg = npp.random.default_rng(SEED) function_to_test(rg) Resetting the state of the underlying BitGenerator in-place is possible, as Kevin showed, but if you can refactor your code so that there isn't a persistent Generator object between these runs, that's probably better. It's a code smell if you can't just pass in a fresh Generator; in general, it means that your code is harder to use, not just because we don't expose an in-place seed() method. -- Robert Kern
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <ndbecker2@gmail.com> wrote:
In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances? The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number. -- Robert Kern
![](https://secure.gravatar.com/avatar/bd4477dc26bf9941268fbfa05abdeae6.jpg?s=120&d=mm&r=g)
(apologies for jumping into a conversation) So what is the recommendation for instantiating a number of generators with manually controlled seeds? The use case is running a series of MC simulations with reproducible streams. The runs are independent and are run in parallel in separate OS processes, where I do not control the time each process starts (jobs are submitted to the batch queue), so default seeding seems dubious? Previously, I would just do roughly seeds = [1234, 1235, 1236, ...] rngs = [np.random.RandomState(seed) for seed in seeds] ... and each process operates with its own `rng`. What would be the recommended way with the new `Generator` framework? A human-friendly way would be preferable if possible. Thanks, Evgeni On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/bd4477dc26bf9941268fbfa05abdeae6.jpg?s=120&d=mm&r=g)
Thanks Kevin! A possibly dumb follow-up question: in your example,
entropy = 382193877439745928479635728
is it relevant that `entropy` is a long integer? I.e., what are the constraints on its value, can one use entropy = 1234 or entropy = 0 or entropy = 1 instead? On Mon, Jun 29, 2020 at 5:37 PM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
The properties of the SeedSequence algorithm render this irrelevant, fortunately. While there are seed numbers that might create "bad" outputs from SeedSequence with overly low or high Hamming weight (number of 1s), they are scattered around the input space so you have to adversarially reverse the SeedSequence algorithm to find them. IMO, the only reason to avoid seed numbers like this has more to do with the fact that there are a relatively small number of these seeds. If you are deliberately picking from that small set somehow, it's more likely that other researchers are too, and you are more likely to reuse that same seed.
1. The total number of digits in the binary representation is somewhere between 32 and 128.
I like using the standard library `secrets` module.
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably. We have thought about wrapping that up in a numpy.random function (e.g. `np.random.simple_seed()` or something like that) for convenience, but we wanted to wait a bit before commiting to an API. -- Robert Kern
![](https://secure.gravatar.com/avatar/bd4477dc26bf9941268fbfa05abdeae6.jpg?s=120&d=mm&r=g)
Thanks Kevin, thanks Robert, this is very helpful! I'd strongly agree with Matti that your explanations could/should make it to the docs. Maybe it's something for the GSoD. While we're on the subject, one comment and two (hopefully last) questions: 1. My two cents w.r.t. `np.random.simple_seed()` function Robert mentioned: I personally would find it way more confusing than a clear explanation + example in the docs. I'd ask myself what's "simple" here, click through to the source of this `simple_seed`, find out that it's a docsting and a two-liner, and just copy-paste the latter into my user code. Again, just FWIW. 2. What would be a preferred way of spelling out "give me the N-th spawned child SeedSequence"? The use case is that I prepare (human-readable) input files once and run a number of computational jobs in separate OS processes. From what Kevin said, I can of course five each worker a pair of (entropy, worker_id) and then each of them does at startup
parent_seq = SeedSequence(entropy) this_sequence = seed_seq.spawn(worker_id)[worker_id]
Is this a recommended way, or is there a better API? Or does the number of spawned children need to be known beforehand? I'd much rather avoid serialization/deserialization if possible. 3. Is there a way of telling the number of draws a generator did? The use case is to checkpoint the number of draws and `.advance` the bit generator when resuming from the checkpoint. (The runs are longer then the batch queue limits). Thanks! Evgeni On Mon, Jun 29, 2020 at 11:06 PM Robert Kern <robert.kern@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
Noted.
Assuming that `worker_id` starts at 0: this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))
There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance. -- Robert Kern
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Mon, Jun 29, 2020 at 8:02 AM Neal Becker <ndbecker2@gmail.com> wrote:
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
In general, you should just pass in a new Generator that was created with the same seed. def function_to_test(rg): x = rg.standard_normal() ... SEED = 12345... rg = np.random.default_rng(SEED) function_to_test(rg) rg = npp.random.default_rng(SEED) function_to_test(rg) Resetting the state of the underlying BitGenerator in-place is possible, as Kevin showed, but if you can refactor your code so that there isn't a persistent Generator object between these runs, that's probably better. It's a code smell if you can't just pass in a fresh Generator; in general, it means that your code is harder to use, not just because we don't expose an in-place seed() method. -- Robert Kern
participants (5)
-
Evgeni Burovski
-
Kevin Sheppard
-
Matti Picus
-
Neal Becker
-
Robert Kern