[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

8 Aug 2021

      08.08.21 07:08, Stephen J. Turnbull пише:
...
Serhiy Storchaka writes:
...
Python integers have arbitrary precision. For serialization and
interpolation with other programs and libraries we need to
represent them [...].  [In the case of non-standard precisions,]
[t]here are private C API functions _PyLong_AsByteArray and
_PyLong_FromByteArray, but they are for internal use only.
I am planning to add public analogs of these private functions, but more
powerful and convenient.
PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size,
                           int byteorder, int signed)
Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n,
                          int byteorder, int signed, int *overflow)
I don't understand why such a complex API is useful as a public facility.
There are several goals:

1. Support conversion to/from all C integer types (char, short, int,
long, long long, intN_t, intptr_t, intmax_t, wchar_t, wint_t and
corresponding unsigned types), POSIX integer types (pid_t, uid_t, off_t,
etc) and other platfrom or library specific integer types (like
Tcl_WideInt in libtcl). Currently only supported types are long,
unsigned long, long long, unsigned long, ssize_t and size_t. For other
types you should choose the most appropriate supertype (long or long
long, sometimes providing several varians) and manually handle overflow.

There are requests for PyLong_AsShort(), PyLong_AsInt32(),
PyLong_AsMaxInt(), etc. It is better to provide a single universal
function than extend API by several dozens functions.

2. Support different options for overflow handling. Different options
are present in PyLong_AsLong(), PyLong_AsLongAndOverflow(),
PyLong_AsUnsignedLongMask() and PyNumber_AsSsize_t(). But not all
options are available for all types. There is no *AndOverflow() variant
for unsigned types, size_t, ssize_t, and saturation is only available
for ssize_t.

3. Support serialization of arbitrary precision integers. It is used in
pickle and random, and can be used to support other binary data formats.

All these goals can be achieved by few universal functions.
...
So I might want
PyLong_AsGMPInt and PyLong_AsGMPRatio as well as the corresponding
functions for MP, and maybe even PyLong_AsGMPFloat.  The obvious way
to write those is <library constructor>(str(python_integer)), I think.
PyLong_AsGMPInt() cannot be added until GMP be included in Python
interpreter, and it is very unlikely. Converting via decimal
representation is very inefficient way, especially for very long
integers (it has cubic complexity from the size of the integer). I think
GMP support more efficient conversions.
...
In the unlikely event that an
application needs to squeeze out that tiny bit of performance, I guess
the library constructors all accept buffers of bytes, too, probably
with a similarly complex API that can handle whatever the Python ABI
throws at them.
For using the library constructors accepting buffers of bytes we need
buffers of bytes. And the proposed functions provide the only interface
for conversion Python integers to/from buffer of bytes.
...
In which case why not just expose the internal
functions?
If you mean _PyLong_FromByteArray/_PyLong_AsByteArray, it is because we
should polish them before exposing them. They currently do not provide
different options for overflow, and I think that it may be more
convenient way for common case of native bytes order. The names of
functions, the number and order of parameters can be discussed. For such
discussion I opened this thread. If you have alternative propositions,
please show them.
...
Is it at all likely that that representation would ever
change?
They do not rely on internal representation. They are for
implementation-indepent representation.