[Python-Dev] Timing for removing legacy Unicode APIs deprecated by PEP 393

INADA Naoki songofacandy at gmail.com
Fri Apr 13 09:27:16 EDT 2018


PEP 393 [1] deprecates some Unicode APIs relating to Py_UNICODE.
The PEP doesn't provide schedule for removing them.  But the APIs are
marked "will be removed in 4.0" in the document.
When removing them, we can reduce `wchar_t *` member of unicode object.
It takes 8 bytes on 64bit platform.

[1]: "Flexible String Representation" https://www.python.org/dev/peps/pep-0393/

I thought Python 4.0 is the next version of 3.9.  But Guido has different idea.
He said following at Zulip chat (we're trying it for now).

> No, 4.0 is not just what comes after 3.9 -- the major number change would indicate some kind of major change somewhere (like possibly the Gilectomy, which changes a lot of the C APIs). If we have more than 10 3.x versions, we'll just live with 3.10, 3.11 etc.

And he said about these APIs:

>> Unicode objects has some "Deprecated since version 3.3, will be removed in version 4.0" APIs (pep-393).
>> When removing them, we can reduce PyUnicode size about 8~12byte.
> We should be able to deprecate these sooner by updating the docs.

Then, I want to reschedule the removal of these APIs.
Can we remove them in 3.8? 3.9? or 3.10?
I prefer sooner as possible.


Slightly off topic, there are 4bytes alignment gap in the unicode object,
on 64bit platform.

typedef struct {
    struct {
        unsigned int interned:2;
        unsigned int kind:3;
        unsigned int compact:1;
        unsigned int ascii:1;
        unsigned int ready:1;
        unsigned int :24;
    } state;  // 4 bytes

    // implicit 4 bytes gap here.

    wchar_t *wstr;  // 8 bytes
} PyASCIIObject;

So, I think we can reduce 12 bytes instead of 8 bytes when removing wstr.
Or we can reduce 4 bytes soon by moving `wstr` before `state`.

Off course, it needs siphash support 4byte aligned data instead of 8byte.

INADA Naoki  <songofacandy at gmail.com>

More information about the Python-Dev mailing list