steve at pearwood.info
Mon Aug 9 12:39:23 CEST 2010
On Mon, 9 Aug 2010 07:23:56 pm Dave Angel wrote:
> Big difference between 2.x and 3.x. In 3.x, strings are Unicode, and
> may be stored either in 16bit or 32bit form (Windows usually compiled
> using the former, and Linux the latter).
That's an internal storage that you (generic you) the Python programmer
doesn't see, except perhaps indirectly via memory consumption.
Do you know how many bits are used to store floats? If you try:
in Python 2.6 or better, it tells you that a float *object* takes 16
bytes, but it doesn't tell you anything about the underlying C-level
floating point data. And a float that prints like:
takes up exactly the same storage as one that prints like:
We can do a bit better with unicode strings:
>>> sys.getsizeof(u'a') - sys.getsizeof(u'')
but frankly, who cares? It doesn't *mean* anything. Whether a character
takes up two bytes, or twenty-two bytes, is irrelevant to how it
> Presumably in 3.x, urandom returns a byte string (see the b'xxxx'
> form), which is 8 bits each, same as 2.x strings. So you'd expect
> only two hex digits for each character.
It looks like you've missed the point that bytes don't always display as
two hex digits. Using Python 3.1, we can see some bytes display in
hex-escape form, e.g.:
>>> bytes([0, 1, 20, 30, 200])
some will display in character-escape form:
>>> bytes([9, 10, 13])
and some will display as unescaped ASCII characters:
>>> bytes([40, 41, 80, 90, 110])
So you can't make any definitive statement that the output of urandom
will be displayed in hex form. Because the output is random, you might,
by some incredible fluke, get:
I wouldn't like to bet on it though. By my calculation, the odds of that
exact output is 1 in 79228162514264337593543950336.
The odds of getting nothing but hex-escaped characters is a bit better.
By my estimate, the odds of getting 12 hex-escaped characters in a row
is about 1 in 330. For six in a row, it's about 1 in 18 or so.
By the way, an interesting aside... bytes aren't always 8 bits. Of
course, on just about all machines that have Python on them, they will
be, but there are still machines and devices such as signal processors
where bytes are something other than 8 bits. Historically, common
values included 5, 6, 7, 9, or 16 bits, and the C and C++ standards
still define a constant CHAR_BIT to specify the number of bits in a
More information about the Tutor