[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

Aug. 8, 2021


      
...
On 7 Aug 2021, at 19:22, Serhiy Storchaka <storchaka@gmail.com> wrote:
Python integers have arbitrary precision. For serialization and
interpolation with other programs and libraries we need to represent
them as fixed-width integers (little- and big-endian, signed and
unsigned). In Python, we can use struct, array, memoryview and ctypes
use for some standard sizes and int methods int.to_bytes and
int.from_bytes for non-standard sizes. In C, there is the C API for
converting to/from C types long, unsigned long, long long and unsigned
long long. For other C types (signed and unsigned char, short, int) we
need to use the C API for converting to long, and then truncate to the
destination type with checking for overflow. For integers type aliases
like pid_t we need to determine their size and signess and use
corresponding C API or wrapper. For non-standard integers (e.g. 24-bit),
integers wider than long long, and arbitrary precision integers all is
much more complicated. There are private C API functions
_PyLong_AsByteArray and _PyLong_FromByteArray, but they are for internal
use only.
I am planning to add public analogs of these private functions, but more
powerful and convenient.
PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size,
                          int byteorder, int signed)
Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n,
                         int byteorder, int signed, int *overflow)
PyLong_FromBytes() returns the int object. It only fails in case of
memory error or incorrect arguments (e.g. buf is NULL).
PyLong_AsBytes() writes bytes to the specified buffer, it does not
allocate memory. If buf is NULL it returns the minimum size of the
buffer for representing the integer. -1 is returned on error. if
overflow is NULL, then OverfowError is raised, otherwise *overflow is
set to +1 for overflowing the upper limit, -1 for overflowing the lower
limit, and 0 for no overflow.
Now I have some design questions.
1. How to encode the byte order?
a) 1 -- little endian, 0 -- big endian
b) 0 -- little endian, 1 -- big endian
c) -1 -- little endian, +1 -- big endian, 0 -- native endian.
Use an enum and do not use 0 as a valid value to make mistakes easier to detect.
I think you are right to have big endian, little endian and native endian.
I do not think the numeric values of the enum matter (apart from avoiding 0).
...
Do we need to reserve some values for mixed endians?
What is mixed endian? I would guess that its use would be application
specific - so I assume you would not need to support it.
...
2. How to specify the reduction modulo 2**(8*size) (like in
PyLong_AsUnsignedLongMask)?
Add yet one flag in PyLong_AsBytes()? Use special value for the signed
argument? 0 -- unsigned, 1 -- signed, 2 (or -1) -- modulo. Or use some
combination of signed and overflow?
3. How to specify saturation (like in PyNumber_AsSsize_t())? I.e. values
less than the lower limit are replaced with the lower limit, values
greater than the upper limit are replaced with the upper limit.
Same options as for (2): separate flag, encode in signed (but we need
two values here) or combination of other parameters.
Maybe a single enum that has:
signed (modulo)
signed saturate
unsigned (modulo)
unsigned saturate
...
4. What exact names to use?
PyLong_FromByteArray/PyLong_AsByteArray,
PyLong_FromBytes/PyLong_AsBytes, PyLong_FromBytes/PyLong_ToBytes?
Barry
...
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/V2EKXM...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

Barry Scott