On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
Thanks Kevin, thanks Robert, this is very helpful!
I'd strongly agree with Matti that your explanations could/should make it to the docs. Maybe it's something for the GSoD.
While we're on the subject, one comment and two (hopefully last) questions:
1. My two cents w.r.t. `np.random.simple_seed()` function Robert mentioned: I personally would find it way more confusing than a clear explanation + example in the docs. I'd ask myself what's "simple" here, click through to the source of this `simple_seed`, find out that it's a docsting and a two-liner, and just copy-paste the latter into my user code. Again, just FWIW.
Noted.
2. What would be a preferred way of spelling out "give me the N-th spawned child SeedSequence"? The use case is that I prepare (human-readable) input files once and run a number of computational jobs in separate OS processes. From what Kevin said, I can of course five each worker a pair of (entropy, worker_id) and then each of them does at startup
parent_seq = SeedSequence(entropy) this_sequence = seed_seq.spawn(worker_id)[worker_id]
Is this a recommended way, or is there a better API? Or does the number of spawned children need to be known beforehand? I'd much rather avoid serialization/deserialization if possible.
Assuming that `worker_id` starts at 0: this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))
3. Is there a way of telling the number of draws a generator did?
The use case is to checkpoint the number of draws and `.advance` the bit generator when resuming from the checkpoint. (The runs are longer then the batch queue limits).
There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance. -- Robert Kern