Thanks Kevin, thanks Robert, this is very helpful!
I'd strongly agree with Matti that your explanations could/should make
it to the docs. Maybe it's something for the GSoD.
While we're on the subject, one comment and two (hopefully last) questions:
1. My two cents w.r.t. `np.random.simple_seed()` function Robert
mentioned: I personally would find it way more confusing than a clear
explanation + example in the docs. I'd ask myself what's "simple"
here, click through to the source of this `simple_seed`, find out that
it's a docsting and a two-liner, and just copy-paste the latter into
my user code. Again, just FWIW.
Noted.
2. What would be a preferred way of spelling out "give me the N-th
spawned child SeedSequence"?
The use case is that I prepare (human-readable) input files once and
run a number of computational jobs in separate OS processes. From what
Kevin said, I can of course five each worker a pair of (entropy,
worker_id) and then each of them does at startup
> parent_seq = SeedSequence(entropy)
> this_sequence = seed_seq.spawn(worker_id)[worker_id]
Is this a recommended way, or is there a better API? Or does the
number of spawned children need to be known beforehand?
I'd much rather avoid serialization/deserialization if possible.
Assuming that `worker_id` starts at 0:
this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))
3. Is there a way of telling the number of draws a generator did?
The use case is to checkpoint the number of draws and `.advance` the
bit generator when resuming from the checkpoint. (The runs are longer
then the batch queue limits).
There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance.