On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:

It can be anything, but “good practice” is to use a number that would have 2 properties:


  1. When expressed as binary number, it would have a large number of both 0s and 1s

The properties of the SeedSequence algorithm render this irrelevant, fortunately. While there are seed numbers that might create "bad" outputs from SeedSequence with overly low or high Hamming weight (number of 1s), they are scattered around the input space so you have to adversarially reverse the SeedSequence algorithm to find them. IMO, the only reason to avoid seed numbers like this has more to do with the fact that there are a relatively small number of these seeds. If you are deliberately picking from that small set somehow, it's more likely that other researchers are too, and you are more likely to reuse that same seed.
  1. The total number of digits in the binary representation is somewhere between 32 and 128.

I like using the standard library `secrets` module.

>>> import secrets
>>> secrets.randbelow(1<<128)
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably.

We have thought about wrapping that up in a numpy.random function (e.g. `np.random.simple_seed()` or something like that) for convenience, but we wanted to wait a bit before commiting to an API.

Robert Kern