On Thu, Jul 2, 2020 at 5:20 AM M.-A. Lemburg
The reasoning here is the same as for decoding: you have the original data you want to process available in some array and want to turn this into the Python object.
The path Victor suggested requires always going via a Python Unicode object, but that it very expensive and not really an appropriate way to address the use case.
But current PyUnicode_Encode* APIs does `PyUnicode_FromWideChar`. It is no direct API already. Additionally, pyodbc, the only user of the encoder API, did PyUnicode_EncodeUTF16(PyUnicode_AsUnicode(unicode), ...) It is very inefficient. Unicode Object -> Py_UNICODE* -> Unicode Object -> byte object. And as many others already said, most C world use UTF-8 for Unicode representation in C, not wchar_t. So I don't want to undeprecate current API.
As an example application, think of a database module which provides the Unicode data as Py_UNICODE buffer.
Py_UNICODE is deprecated. So I assume you are talking about wchar_t.
You want to write this as UTF-8 data to a file or a socket, so you have the PyUnicode_EncodeUTF8() API decode this for you into a bytes object which you can then write out using the Python C APIs for this.
PyUnicode_FromWideChar + PyUnicode_AsUTF8AndSize is better than
PyUnicode_EncodeUTF8.
PyUnicode_EncodeUTF8 allocate temporary Unicode object anyway. So it needs
to allocate Unicode object *and* char* buffer for UTF-8.
On the other hand, PyUnicode_AsUTF8AndSize can just expose internal
data when it is plain ASCII. Since ASCII string is very common, this
is effective
optimization.
Regards,
--
Inada Naoki