[Tutor] os.urandom()

Steven D'Aprano steve at pearwood.info
Mon Aug 9 12:39:23 CEST 2010


On Mon, 9 Aug 2010 07:23:56 pm Dave Angel wrote:

> Big difference between 2.x and 3.x.  In 3.x, strings are Unicode, and
> may be stored either in 16bit or 32bit form (Windows usually compiled
> using the former, and Linux the latter).

That's an internal storage that you (generic you) the Python programmer 
doesn't see, except perhaps indirectly via memory consumption.

Do you know how many bits are used to store floats? If you try:

>>> sys.getsizeof(1.1)
16

in Python 2.6 or better, it tells you that a float *object* takes 16 
bytes, but it doesn't tell you anything about the underlying C-level 
floating point data. And a float that prints like:

1.0

takes up exactly the same storage as one that prints like:

1.234567890123456789


We can do a bit better with unicode strings:

>>> sys.getsizeof(u'a') - sys.getsizeof(u'')
2

but frankly, who cares? It doesn't *mean* anything. Whether a character 
takes up two bytes, or twenty-two bytes, is irrelevant to how it 
prints. 


> Presumably in 3.x, urandom returns a byte string   (see the b'xxxx'
> form), which is 8 bits each, same as 2.x strings.  So you'd expect
> only two hex digits for each character.

It looks like you've missed the point that bytes don't always display as 
two hex digits. Using Python 3.1, we can see some bytes display in 
hex-escape form, e.g.:

>>> bytes([0, 1, 20, 30, 200])
b'\x00\x01\x14\x1e\xc8'

some will display in character-escape form:

>>> bytes([9, 10, 13])
b'\t\n\r'

and some will display as unescaped ASCII characters:

>>> bytes([40, 41, 80, 90, 110])
b'()PZn'

So you can't make any definitive statement that the output of urandom 
will be displayed in hex form. Because the output is random, you might, 
by some incredible fluke, get:

>>> os.urandom(6)
b'hello '
>>> os.urandom(6)
b'world!'


I wouldn't like to bet on it though. By my calculation, the odds of that 
exact output is 1 in 79228162514264337593543950336.

The odds of getting nothing but hex-escaped characters is a bit better. 
By my estimate, the odds of getting 12 hex-escaped characters in a row 
is about 1 in 330. For six in a row, it's about 1 in 18 or so.


By the way, an interesting aside... bytes aren't always 8 bits. Of 
course, on just about all machines that have Python on them, they will 
be, but there are still machines and devices such as signal processors 
where bytes are something other than 8 bits. Historically, common 
values included 5, 6, 7, 9, or 16 bits, and the C and C++ standards 
still define a constant CHAR_BIT to specify the number of bits in a 
byte.



-- 
Steven D'Aprano


More information about the Tutor mailing list