[Numpy-discussion] locking np.random.Generator in a cython nogil context?
Evgeni Burovski
evgeny.burovskiy at gmail.com
Thu Dec 17 04:47:11 EST 2020
On Tue, Dec 15, 2020 at 1:00 AM Robert Kern <robert.kern at gmail.com> wrote:
>
> On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski <evgeny.burovskiy at gmail.com> wrote:
>>
>> <snip>
>>
>> > I also think that the lock only matters for Multithreaded code not Multiprocess. I believe the latter pickles and unpickles any Generator object (and the underying BitGenerator) and so each process has its own version. Note that when multiprocessing the recommended procedure is to use spawn() to generate a sequence of BitGenerators and to use a distinct BitGenerator in each process. If you do this then you are free from the lock.
>>
>> Thanks. Just to confirm: does using SeedSequence spawn_key arg
>> generate distinct BitGenerators? As in
>>
>> cdef class Wrapper():
>> def __init__(self, seed):
>> entropy, num = seed
>> py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
>> self.rng = <bitgen_t *>
>> py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator") # <---
>> this
>>
>> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
>> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))
>>
>> And then,of these two objects, do they have distinct BitGenerators?
>
>
> The code you wrote doesn't work (`spawn_key` is never assigned). I can guess what you meant to write, though, and yes, you would get distinct `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. The `SeedSequence.spawn()` method internally keeps track of how many children it has spawned and uses that to construct the `spawn_key`s for its subsequent children. If you play around with making your own `spawn_key`s, then the parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the ones you constructed.
>
> If you don't want to use the `spawn()` API to construct the separate `SeedSequence`s but still want to incorporate some per-process information into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of integers is a valid value for the `entropy` argument. You can have the first item be the same (i.e. per-run information) and the second item be a per-process ID or counter.
>
> cdef class Wrapper():
> def __init__(self, seed):
> py_gen = PCG64(SeedSequence(seed))
> self.rng = <bitgen_t *>py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")
>
> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))
Thanks Robert!
I indeed typo'd the spawn_key, and indeed the intention is exactly to
include a worker_id into a seed to make sure each worker gets a
separate stream.
The use of the spawn_key was --- as I now finally realize --- a
misunderstanding of your and Kevin's previous replies in
https://mail.python.org/pipermail/numpy-discussion/2020-July/080833.html
So I'm moving my project to use the `SeedSequence((base_seed,
worker_id))` API --- thanks!
Just as a side note, this is not very prominent in the docs, and I'm
ready to volunteer to send a doc PR --- I'm only not sure which part
of the docs, and would appreciate a pointer.
More information about the NumPy-Discussion
mailing list