[Python-Dev] Timing for removing legacy Unicode APIs deprecated by PEP 393
INADA Naoki
songofacandy at gmail.com
Fri Apr 13 09:27:16 EDT 2018
Hi,
PEP 393 [1] deprecates some Unicode APIs relating to Py_UNICODE.
The PEP doesn't provide schedule for removing them. But the APIs are
marked "will be removed in 4.0" in the document.
When removing them, we can reduce `wchar_t *` member of unicode object.
It takes 8 bytes on 64bit platform.
[1]: "Flexible String Representation" https://www.python.org/dev/peps/pep-0393/
I thought Python 4.0 is the next version of 3.9. But Guido has different idea.
He said following at Zulip chat (we're trying it for now).
> No, 4.0 is not just what comes after 3.9 -- the major number change would indicate some kind of major change somewhere (like possibly the Gilectomy, which changes a lot of the C APIs). If we have more than 10 3.x versions, we'll just live with 3.10, 3.11 etc.
And he said about these APIs:
>> Unicode objects has some "Deprecated since version 3.3, will be removed in version 4.0" APIs (pep-393).
>> When removing them, we can reduce PyUnicode size about 8~12byte.
>
> We should be able to deprecate these sooner by updating the docs.
Then, I want to reschedule the removal of these APIs.
Can we remove them in 3.8? 3.9? or 3.10?
I prefer sooner as possible.
---
Slightly off topic, there are 4bytes alignment gap in the unicode object,
on 64bit platform.
typedef struct {
....
struct {
unsigned int interned:2;
unsigned int kind:3;
unsigned int compact:1;
unsigned int ascii:1;
unsigned int ready:1;
unsigned int :24;
} state; // 4 bytes
// implicit 4 bytes gap here.
wchar_t *wstr; // 8 bytes
} PyASCIIObject;
So, I think we can reduce 12 bytes instead of 8 bytes when removing wstr.
Or we can reduce 4 bytes soon by moving `wstr` before `state`.
Off course, it needs siphash support 4byte aligned data instead of 8byte.
Regards,
--
INADA Naoki <songofacandy at gmail.com>
More information about the Python-Dev
mailing list