
On Mon, Jun 29, 2020 at 12:17 AM Inada Naoki <songofacandy@gmail.com> wrote:
More aggressive idea: override current PyUnicode_EncodeXXX() apis. Change from `Py_UNICODE *object` to `PyObject *unicode`.
This is a list of PyUnicode_EncodeXXXX usage in top4000 packages. https://gist.github.com/methane/0f97391c9dbf5b53a818aa39a8285a29 Scandir use PyUnicode_EncodeMBCS only in `#if PY_MAJOR_VERSION < 3 && defined(MS_WINDOWS)` block. So it is false positive. Cython has prototype of these APIs. pyodbc uses PyUnicode_EncodeUTF16 and PyUnicode_EncodeUTF8. But pyodbc is converting Unicode Object into bytes object. So current API is very inefficient. That's all. Now I think it is safe to override deprecated APIs with private APIs accepts Unicode Object. * _PyUnicode_EncodeUTF7 -> PyUnicode_EncodeUTF7 * _PyUnicode_AsUTF8String -> PyUnicode_EncodeUTF8 * _PyUnicode_EncodeUTF16 -> PyUnicode_EncodeUTF16 * _PyUnicode_EncodeUTF32 -> PyUnicode_EncodeUTF32 * _PyUnicode_AsLatin1String -> PyUnicode_EncodeLatin1 * _PyUnicode_AsASCIIString -> PyUnicode_EncodeASCII * _PyUnicode_EncodeCharmap -> PyUnicode_EncodeCharmap -- Inada Naoki <songofacandy@gmail.com>