[Python-Dev] Timing for removing legacy Unicode APIs deprecated by PEP 393

Serhiy Storchaka storchaka at gmail.com
Wed Apr 18 15:16:23 EDT 2018

13.04.18 16:27, INADA Naoki пише:
> Then, I want to reschedule the removal of these APIs.
> Can we remove them in 3.8? 3.9? or 3.10?
> I prefer sooner as possible.

I suppose that many users will start porting to Python 3 only in 2020, 
after 2.7 EOL. After that time we shouldn't support compatibility with 
2.7 and can start emitting deprecation warnings at runtime. After 1 or 2 
releases after that we can make corresponding public API always failing 
and remove private API and data fields.

> Slightly off topic, there are 4bytes alignment gap in the unicode object,
> on 64bit platform.
> typedef struct {
> .....
>      struct {
>          unsigned int interned:2;
>          unsigned int kind:3;
>          unsigned int compact:1;
>          unsigned int ascii:1;
>          unsigned int ready:1;
>          unsigned int :24;
>      } state;  // 4 bytes
>      // implicit 4 bytes gap here.
>      wchar_t *wstr;  // 8 bytes
> } PyASCIIObject;
> So, I think we can reduce 12 bytes instead of 8 bytes when removing wstr.
> Or we can reduce 4 bytes soon by moving `wstr` before `state`.
> Off course, it needs siphash support 4byte aligned data instead of 8byte.

There are other functions which expect that data is aligned to 
sizeof(long) or 8 bytes.

Siphash hashing is special because it is called not just for strings and 
bytes, but for memoryview, which doesn't guarantee any alignment.

Note that after removing the wchar_t* field the gap will not gone, 
because the size of the structure should be a multiple of the alignment 
of the first field (which is a pointer).

More information about the Python-Dev mailing list