[Numpy-discussion] reseed random generator (1.19)
robert.kern at gmail.com
Sat Jul 4 13:55:45 EDT 2020
On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski <evgeny.burovskiy at gmail.com>
> Thanks Kevin, thanks Robert, this is very helpful!
> I'd strongly agree with Matti that your explanations could/should make
> it to the docs. Maybe it's something for the GSoD.
> While we're on the subject, one comment and two (hopefully last) questions:
> 1. My two cents w.r.t. `np.random.simple_seed()` function Robert
> mentioned: I personally would find it way more confusing than a clear
> explanation + example in the docs. I'd ask myself what's "simple"
> here, click through to the source of this `simple_seed`, find out that
> it's a docsting and a two-liner, and just copy-paste the latter into
> my user code. Again, just FWIW.
> 2. What would be a preferred way of spelling out "give me the N-th
> spawned child SeedSequence"?
> The use case is that I prepare (human-readable) input files once and
> run a number of computational jobs in separate OS processes. From what
> Kevin said, I can of course five each worker a pair of (entropy,
> worker_id) and then each of them does at startup
> > parent_seq = SeedSequence(entropy)
> > this_sequence = seed_seq.spawn(worker_id)[worker_id]
> Is this a recommended way, or is there a better API? Or does the
> number of spawned children need to be known beforehand?
> I'd much rather avoid serialization/deserialization if possible.
Assuming that `worker_id` starts at 0:
this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))
> 3. Is there a way of telling the number of draws a generator did?
> The use case is to checkpoint the number of draws and `.advance` the
> bit generator when resuming from the checkpoint. (The runs are longer
> then the batch queue limits).
There are computations you can do on the internal state of PCG64 and Philox
to get this information, but not in general, no. I do recommend serializing
the Generator or BitGenerator (or at least the BitGenerator's .state
property, which is a nice JSONable dict for PCG64) for checkpointing
purposes. Among other things, there is a cached uint32 for when odd numbers
of uint32s are drawn that you might need to handle. The state of the
default PCG64 is much smaller than MT19937. It's less work and more
reliable than computing that distance and storing the original seed and the
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion