
(I'm repeating some of the comments already made in the bug-tracker; as Antoine pointed out, discussion should probably remain here until the API is settled.) On Mon, Aug 10, 2009 at 2:13 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:
[...]
That suggests to me the following signatures for the conversion functions (regardless of the names they might be given):
int.from_bytes(data, *, little_endian=None, signed=True) little_endian would become a three-valued parameter for the Python version: None = native; False = little-endian; True = big-endian. The three valued parameter is necessary since Python code can't get at the "IS_LITTLE_ENDIAN" macro that the PyLong code uses to determine the endianness of the system when calling the C API functions. signed would just be an ordinary boolean flag
Sounds good to me. I'm not sure about the 'signed=True' default; to me, a default of unsigned seems more natural. But this is bikeshedding, and I'd happily accept either default. I agree with other posters that there seems little reason not to accept the empty string. It's a natural end-case for unsigned input; whether it's natural for signed input (where there should really be at least one 'sign bit', and hence at least one byte) is arguable, but I can't see the harm in accepting it.
int.to_bytes(data, *, size=0, little_endian=None, signed=True) A size <= 0 would mean to produce as many bytes as are needed to represent the integer and no more. Otherwise it would represent the maximum number of bytes allowed in the response (raising OverflowError if the value won't fit). little_endian and signed would be interpreted as for the conversion from bytes to an integer
I'm not convinced that it's valuable to a have a variable-size version of this; I'd make size a required argument. The problem with the variable size version is that the choice of byte-length for the output for a given integer input is a little bit arbitrary. For a particular requirement (producing code to conform with some existing serialization protocol, for example) it seems likely that the choice Python makes will disagree with what's required by that protocol, so that size still has to be given explicitly. On the other hand, if a user just wants a quick and easy way to serialize ints, without caring about the exact form of the serialization, then there are number of solutions already available within Python. +1 on raising OverflowError for out-of-range inputs, instead of wrapping modulo 2**whatever. This also fits with the way that the struct module currently behaves. Does anyone see other use-cases for variable-size conversion? [Greg Ewing]
I don't like the idea of a three-valued boolean. I also don't like boolean parameters whose sense is abritrary (why is it called "little_endian" and not "big_endian", and how do I remember which convention was chosen?)
My suggestion would be to use the same characters that the struct module uses to represent endianness (">" for big-endian, "<" for little-endian, etc.)
How about a parameter byteorder=None, accepting values 'big' and 'little'? Then one could use byteorder=sys.byteorder to explicitly specify native byteorder. Mark