[Python-ideas] Direct byte<->int conversions (was Re: bitwise operations on bytes)

Mon Aug 10 15:13:48 CEST 2009

Eric Eisner wrote:
> For completeness, any function converting from int to bytes needs to
> accept the arguments size, endianness, and two's complement handling.
> By default, size and two's complement could be inferred from the int's
> value.

Keep in mind that Mark's suggestion isn't to completely reinvent the
wheel - it is more about exposing an existing internal C API to Python
code. Since that C API is provided by the int object, we should also
consider the option of providing it in Python purely as methods of that
object.

Bytes to int:

  bytes pointer: pointer to first byte to be converted
  size: number of bytes to be converted
  little_endian: flag indicating MSB is at offset 0
  signed: flag indicating whether or not the value is signed

For a Python version of this conversion, the size argument is
unnecessary, reducing the required parameters to a
bytes/bytearray/memoryview reference, an endianness marker and a
'signed' flag (to indicate whether the buffer contains an unsigned value
or a signed two's complement value).

One slight quirk of the C API that probably shouldn't be replicated is a
size of 0 translating to an integer result of zero. For the Python API,
passing in an empty byte sequence should trigger a ValueError.

In the C API, int to bytes takes the same parameters as the bytes to int
conversion, but the meaning is slightly different.

Int to bytes:

  bytes pointer: pointer to first byte to be converted
  size: number of bytes in result
  little_endian: flag indicating to write MSB to offset 0
  signed: flag indicating negative value can be converted

In this case, the size matters even for a Python version as an
OverflowError will be raised if the integer won't fit in the specified
size and sufficient sign bits are added to pad the result out to the
requested size otherwise.

That suggests to me the following signatures for the conversion
functions (regardless of the names they might be given):

int.from_bytes(data, *, little_endian=None, signed=True)
  little_endian would become a three-valued parameter for the Python
version:  None = native; False = little-endian; True = big-endian. The
three valued parameter is necessary since Python code can't get at the
"IS_LITTLE_ENDIAN" macro that the PyLong code uses to determine the
endianness of the system when calling the C API functions.
  signed would just be an ordinary boolean flag

int.to_bytes(data, *, size=0, little_endian=None, signed=True)
  A size <= 0 would mean to produce as many bytes as are needed to
represent the integer and no more. Otherwise it would represent the
maximum number of bytes allowed in the response (raising OverflowError
if the value won't fit).
  little_endian and signed would be interpreted as for the conversion
from bytes to an integer

Sure, these could be moved into a module of their own, but I'm not sure
what would be gained by doing so.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------