Le dim. 28 juin 2020 à 11:22, M.-A. Lemburg <mal@egenix.com> a écrit :
as you may remember, I wasn't happy with the deprecations of the APIs in PEP 393, since there are no C API alternatives for the encoding APIs deprecated in the PEP, which allow direct encoding provided by these important codecs.
AFAIK, the situation hasn't changed since then.
I would prefer to analyze the list on a case by case basis. I don't think that it's useful to expose every single encoding supported by Python as a C function. I would prefer to only have a fast-path for the most common encodings: ASCII, Latin1, UTF-8, Windows ANSI code page. That's all. For any other encodings, the general PyUnicode_AsEncodedString() and PyUnicode_Decode() function are good enough. If someone expects an overhead of passing a string, please prove it with a benchmark. But IMO a small overhead is acceptable for rare encodings. Note: PyUnicode_AsEncodedString() and PyUnicode_Decode() also have "fast paths" for most common encodings: ASCII, UTF-8, "mbcs" (Python alias of the Windows ANSI code page), Latin1. But also UTF-16 and UTF-32: I'm not if it's really worth it to have these ones, but it was cheap to have them :-)
We can't just remove access to one half of a codec (the decoding part) without at least providing an alternative for C extensions to use.
I disagree, we can. The alternative exists since Python 2: PyUnicode_AsEncodedString() and PyUnicode_Decode().
Given PEP 393, this would be APIs which use wchar_t instead of Py_UNICODE.
Using wchar_t is inefficient on all platforms using 16-bit wchar_t since surrogate pairs need a special code path. For example, PyUnicode_FromWideChar() has to scan the string twice: the first time to count the number of surrogate pairs, to allocate the exact buffer size. Victor -- Night gathers, and now my watch begins. It shall not end until my death.