Mailman 3 os.urandom API - Python-Dev

os.urandom API

older
[Python-Dev] Re: [Python-checkins]...

Raymond Hettinger

29 Aug 2004 29 Aug '04

3:37 p.m.

I would like to change the API for the new os.urandom(n) function to return a long integer instead of a string. The former better serves more use cases and fits better with existing modules. In favor of a long integer: 1) The call random.seed(os.random(100)) is a likely use case. If the intermediate value is a string, then random.seed() will hash it and only use 32 bits. If the intermediate value is a long integer, all bits are used. In the given example, the latter is clearly what the user expects (otherwise, they would only request 4 bytes). 2) Another likely use case is accessing all the tools in the random module with a subclass that overrides random() and getrandbits(). Both can be done easier and faster if os.random() returns long integers. If the starting point is a string, the code gets ugly and slow. 3) Most use cases for random values involve numeric manipulation. Simple tasks like finding a random integer in the range [0,100000) become unnecessarily more complicated when starting from a string. 4) The decimal module supports instantiation directly from long integers but not from binary strings. In favor of a string of bytes: 1) This form is handy for cyptoweenies to xor with other byte strings (perhaps for a one-time pad). Raymond

Show replies by date

"Martin v. Löwis"

29 Aug 29 Aug

4:26 p.m.

Raymond Hettinger wrote:

...

I would like to change the API for the new os.urandom(n) function to return a long integer instead of a string. The former better serves more use cases and fits better with existing modules.

-1. Bytes is what the underlying system returns, and it is also conceptually the right thing. We are really talking about a stream of random bytes here (where u signals unlimitedness).

...

1) The call random.seed(os.random(100)) is a likely use case. If the intermediate value is a string, then random.seed() will hash it and only use 32 bits. If the intermediate value is a long integer, all bits are used. In the given example, the latter is clearly what the user expects (otherwise, they would only request 4 bytes).

Then add an os.randint if you think this is important. Given the easiness of using the struct module, I don't think it is important to provide this out of the box.

...

2) Another likely use case is accessing all the tools in the random module with a subclass that overrides random() and getrandbits(). Both can be done easier and faster if os.random() returns long integers. If the starting point is a string, the code gets ugly and slow.

Don't try guessing use cases too much. I don't think either the original submitter, nor the original reviewer, had sequences of pseudo-random numbers as their use case. Instead, the typical application will be a one-time token for some crypto algorithm, in which case sequences of pseudo-randomness are evil. What kind of data structure these things will need is hard to guess, but "sequence of bytes" is a good bet.

...

3) Most use cases for random values involve numeric manipulation. Simple tasks like finding a random integer in the range [0,100000) become unnecessarily more complicated when starting from a string.

That is not true. Most use cases of random numbers involve bit manipulation.

...

1) This form is handy for cyptoweenies to xor with other byte strings (perhaps for a one-time pad).

And indeed, cryptoweenies have contributed that code. He who writes the code choses the interface. Regards, Martin

Tim Peters

4:55 p.m.

I agree with Martin that the patch submitter and reviewers really want strings to come back -- there are many use cases for that in cryptoweenie land, as simple as generating a random session key. Long ints might be cool too, but if so that deserves a distinct API. Something like "urandom(n) -- return a uniform random int from 0 through 256**n-1" would be a pretty bizarre spec. Note that it's easy (abeit obscure) to generate a long from a string s: long(binascii.hexlify(s), 16) It's harder to go in the other direction. First you do hex(n). Then you chop off the leading "0x". Then there may or may not be a trailing "L", and iff there is you have to lose that too. Then binascii.unhexlify() gets the string. Then you may have to go on to pad with one or more leading NUL bytes. Regardless of whether there's another API added, it might be good to change random.seed() under the covers to use (say) 4 urandom bytes for default initialization (when os.urandom is available). Initializing based on time.time() is weak, and especially on Windows (where there are typically only 18.2 distinct time.time() values per second).

Josiah Carlson

10:48 p.m.

(Quick apology to Tim for accidentally sending this to him only, I recently switched email clients and am still getting used to it)

...

Note that it's easy (abeit obscure) to generate a long from a string s:

long(binascii.hexlify(s), 16)

or even: long(s.encode('hex'), 16)

...

It's harder to go in the other direction. First you do hex(n). Then

Perhaps what is needed is a method to easily convert between large integers and strings. It seems as though a new struct conversion code is in order, something that works similarly to the way the 's' code works: #pack the integer bigint as a signed big-endian packed binary string, #null-filling as necessary, for 64 bytes of precision a = struct.pack('>64g', bigint) #unpack an unsigned little-endian packed binary string of 24 bytes to a #python long b = struct.unpack('<24G', a) With such a change, I think many of the string/integer translation complaints would disappear. - Josiah

Nick Mathewson

30 Aug 30 Aug

11:50 a.m.

On Sun Aug 29 22:37:57 2004, Raymond Hettinger wrote:

...

I would like to change the API for the new os.urandom(n) function to return a long integer instead of a string. The former better serves more use cases and fits better with existing modules.

With all respect, I disagree. As a potential user, and as lead developer of a particularly crypto-heavy Python app (http://mixminion.net/), I'd have to say an interface that returned a long integer would not serve my purposes very well at all. For most crypto apps, you never use the output of your strong entropy source directly. Instead, you use your strong entropy source to generate seeds for (cryptographically strong) PRNGs, to generate keys for your block ciphers, and so on. Nearly all of these modules expect their keys as a sequence of bits, which in Python corresponds more closely to a character-string than to an arbitrary long.

...

In favor of a long integer:

1) The call random.seed(os.random(100)) is a likely use case. If the intermediate value is a string, then random.seed() will hash it and only use 32 bits. If the intermediate value is a long integer, all bits are used. In the given example, the latter is clearly what the user expects (otherwise, they would only request 4 bytes).

Plausible, but as others indicate, it isn't hard to write: random.seed(long(hexlify(os.urandom(100)), 16) And if you think it's really important, you could change random.seed(None) to get the default seed in this way, instead of looking at time.time. But this isn't the primary use case of cryptographically strong entropy: The Mersenne Twister algorithm isn't cryptographically secure. If the developer wants a cryptographically strong PRNG, she shouldn't be using random.random(). If she doesn't want a cryptographically strong PRNG, it's overkill for her to use os.urandom(), and overkill for her to want more than 32 bits of entropy anyway.

...

2) Another likely use case is accessing all the tools in the random module with a subclass that overrides random() and getrandbits(). Both can be done easier and faster if os.random() returns long integers. If the starting point is a string, the code gets ugly and slow.

As above, this isn't the way people use strong entropy in well-designed crypto applications. You use your strong entropy to seed a strong PRNG, and you plug your strong PRNG into a subclass overriding random() and getrandbits(). I agree that somebody might decide to just use os.urandom directly as a shortcut, but such a person isn't likely to care about whether her code is slow -- a good PRNG should outperform calls to os.urandom by an order of magnitude or two. [....]

...

In favor of a string of bytes:

1) This form is handy for cyptoweenies to xor with other byte strings (perhaps for a one-time pad).

2) By returning the result of the OS's random function directly, we make it easier for cryptoweenies to assure themselves that their entropy is good. If it gets massaged into a long, then skeptical cryptoweenies will have that much more code to audit. 3) Most crypto libraries don't currently support keying from longs, and (as noted in other posts) the long->string conversion isn't as easy or clean as the string->long conversion. yrs, -- Nick Mathewson (PGP key changed on 15Aug2004; see http://wangafu.net/key.txt)

7176

Age (days ago)

7177

Last active (days ago)

List overview

Download

4 comments

5 participants

participants (5)

"Martin v. Löwis"
Josiah Carlson
Nick Mathewson
Raymond Hettinger
Tim Peters