On Sat, Aug 7, 2021 at 2:23 PM Serhiy Storchaka <storchaka@gmail.com> wrote:

Python integers have arbitrary precision. For serialization and
interpolation with other programs and libraries we need to represent
them as fixed-width integers (little- and big-endian, signed and
unsigned). In Python, we can use struct, array, memoryview and ctypes
use for some standard sizes and int methods int.to_bytes and
int.from_bytes for non-standard sizes. In C, there is the C API for
converting to/from C types long, unsigned long, long long and unsigned
long long. For other C types (signed and unsigned char, short, int) we
need to use the C API for converting to long, and then truncate to the
destination type with checking for overflow. For integers type aliases
like pid_t we need to determine their size and signess and use
corresponding C API or wrapper. For non-standard integers (e.g. 24-bit),
integers wider than long long, and arbitrary precision integers all is
much more complicated. There are private C API functions
_PyLong_AsByteArray and _PyLong_FromByteArray, but they are for internal
use only.

I am planning to add public analogs of these private functions, but more
powerful and convenient.

PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size,
int byteorder, int signed)

Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n,
int byteorder, int signed, int *overflow)

PyLong_FromBytes() returns the int object. It only fails in case of
memory error or incorrect arguments (e.g. buf is NULL).

PyLong_AsBytes() writes bytes to the specified buffer, it does not
allocate memory. If buf is NULL it returns the minimum size of the
buffer for representing the integer. -1 is returned on error. if
overflow is NULL, then OverfowError is raised, otherwise *overflow is
set to +1 for overflowing the upper limit, -1 for overflowing the lower
limit, and 0 for no overflow.

Now I have some design questions.

1. How to encode the byte order?

a) 1 -- little endian, 0 -- big endian
b) 0 -- little endian, 1 -- big endian
c) -1 -- little endian, +1 -- big endian, 0 -- native endian.

Do we need to reserve some values for mixed endians?

2. How to specify the reduction modulo 2**(8*size) (like in
PyLong_AsUnsignedLongMask)?

Add yet one flag in PyLong_AsBytes()? Use special value for the signed
argument? 0 -- unsigned, 1 -- signed, 2 (or -1) -- modulo. Or use some
combination of signed and overflow?

3. How to specify saturation (like in PyNumber_AsSsize_t())? I.e. values
less than the lower limit are replaced with the lower limit, values
greater than the upper limit are replaced with the upper limit.

Same options as for (2): separate flag, encode in signed (but we need
two values here) or combination of other parameters.

4. What exact names to use?

PyLong_FromByteArray/PyLong_AsByteArray,
PyLong_FromBytes/PyLong_AsBytes, PyLong_FromBytes/PyLong_ToBytes?

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/V2EKXMKSQV25BMRPMDH47IM2OYCLY2TF/
Code of Conduct: http://python.org/psf/codeofconduct/