![](https://secure.gravatar.com/avatar/b68cda4e0d04e1b966cfa5657bbec53d.jpg?s=120&d=mm&r=g)
On 7 Aug 2021, at 19:22, Serhiy Storchaka <storchaka@gmail.com> wrote:
Python integers have arbitrary precision. For serialization and interpolation with other programs and libraries we need to represent them as fixed-width integers (little- and big-endian, signed and unsigned). In Python, we can use struct, array, memoryview and ctypes use for some standard sizes and int methods int.to_bytes and int.from_bytes for non-standard sizes. In C, there is the C API for converting to/from C types long, unsigned long, long long and unsigned long long. For other C types (signed and unsigned char, short, int) we need to use the C API for converting to long, and then truncate to the destination type with checking for overflow. For integers type aliases like pid_t we need to determine their size and signess and use corresponding C API or wrapper. For non-standard integers (e.g. 24-bit), integers wider than long long, and arbitrary precision integers all is much more complicated. There are private C API functions _PyLong_AsByteArray and _PyLong_FromByteArray, but they are for internal use only.
I am planning to add public analogs of these private functions, but more powerful and convenient.
PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size, int byteorder, int signed)
Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n, int byteorder, int signed, int *overflow)
PyLong_FromBytes() returns the int object. It only fails in case of memory error or incorrect arguments (e.g. buf is NULL).
PyLong_AsBytes() writes bytes to the specified buffer, it does not allocate memory. If buf is NULL it returns the minimum size of the buffer for representing the integer. -1 is returned on error. if overflow is NULL, then OverfowError is raised, otherwise *overflow is set to +1 for overflowing the upper limit, -1 for overflowing the lower limit, and 0 for no overflow.
Now I have some design questions.
1. How to encode the byte order?
a) 1 -- little endian, 0 -- big endian b) 0 -- little endian, 1 -- big endian c) -1 -- little endian, +1 -- big endian, 0 -- native endian.
Use an enum and do not use 0 as a valid value to make mistakes easier to detect. I think you are right to have big endian, little endian and native endian. I do not think the numeric values of the enum matter (apart from avoiding 0).
Do we need to reserve some values for mixed endians?
What is mixed endian? I would guess that its use would be application specific - so I assume you would not need to support it.
2. How to specify the reduction modulo 2**(8*size) (like in PyLong_AsUnsignedLongMask)?
Add yet one flag in PyLong_AsBytes()? Use special value for the signed argument? 0 -- unsigned, 1 -- signed, 2 (or -1) -- modulo. Or use some combination of signed and overflow?
3. How to specify saturation (like in PyNumber_AsSsize_t())? I.e. values less than the lower limit are replaced with the lower limit, values greater than the upper limit are replaced with the upper limit.
Same options as for (2): separate flag, encode in signed (but we need two values here) or combination of other parameters.
Maybe a single enum that has: signed (modulo) signed saturate unsigned (modulo) unsigned saturate
4. What exact names to use?
PyLong_FromByteArray/PyLong_AsByteArray, PyLong_FromBytes/PyLong_AsBytes, PyLong_FromBytes/PyLong_ToBytes?
Barry
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/V2EKXM... Code of Conduct: http://python.org/psf/codeofconduct/