[Python-ideas] Direct byte<->int conversions (was Re: bitwise operations on bytes)

Sat Aug 15 14:14:40 CEST 2009

(I'm repeating some of the comments already made in the bug-tracker;
as Antoine pointed out, discussion should probably remain here
until the API is settled.)

On Mon, Aug 10, 2009 at 2:13 PM, Nick Coghlan<ncoghlan at gmail.com> wrote:
> [...]
>
> That suggests to me the following signatures for the conversion
> functions (regardless of the names they might be given):
>
> int.from_bytes(data, *, little_endian=None, signed=True)
>  little_endian would become a three-valued parameter for the Python
> version:  None = native; False = little-endian; True = big-endian. The
> three valued parameter is necessary since Python code can't get at the
> "IS_LITTLE_ENDIAN" macro that the PyLong code uses to determine the
> endianness of the system when calling the C API functions.
>  signed would just be an ordinary boolean flag

Sounds good to me.  I'm not sure about the 'signed=True' default;
to me, a default of unsigned seems more natural.  But this is
bikeshedding, and I'd happily accept either default.

I agree with other posters that there seems little reason not
to accept the empty string.  It's a natural end-case for unsigned
input; whether it's natural for signed input (where there should
really be at least one 'sign bit', and hence at least one byte)
is arguable, but I can't see the harm in accepting it.

> int.to_bytes(data, *, size=0, little_endian=None, signed=True)
>  A size <= 0 would mean to produce as many bytes as are needed to
> represent the integer and no more. Otherwise it would represent the
> maximum number of bytes allowed in the response (raising OverflowError
> if the value won't fit).
>  little_endian and signed would be interpreted as for the conversion
> from bytes to an integer

I'm not convinced that it's valuable to a have a variable-size
version of this;  I'd make size a required argument.

The problem with the variable size version is that the choice of
byte-length for the output for a given integer input is a little bit
arbitrary.  For a particular requirement (producing code to conform
with some existing serialization protocol, for example) it seems
likely that the choice Python makes will disagree with what's
required by that protocol, so that size still has to be given explicitly.
On the other hand, if a user just wants a quick and easy way
to serialize ints, without caring about the exact form of the
serialization, then there are number of solutions already
available within Python.

+1 on raising OverflowError for out-of-range inputs, instead of
wrapping modulo 2**whatever.  This also fits with the way that
the struct module currently behaves.

Does anyone see other use-cases for variable-size conversion?

[Greg Ewing]
> I don't like the idea of a three-valued boolean. I also
> don't like boolean parameters whose sense is abritrary
> (why is it called "little_endian" and not "big_endian",
> and how do I remember which convention was chosen?)

> My suggestion would be to use the same characters that
> the struct module uses to represent endianness (">"
> for big-endian, "<" for little-endian, etc.)

How about a parameter byteorder=None, accepting values
'big' and 'little'?  Then one could use byteorder=sys.byteorder
to explicitly specify native byteorder.

Mark