[Python-Dev] PEP 393: Flexible String Representation
"Martin v. Löwis"
martin at v.loewis.de
Thu Jan 27 22:16:54 CET 2011
> Repetition of "11"; I'm guessing that the 2byte/UCS-2 should read "10",
> so that they give the width of the char representation.
Thanks, fixed.
>> 00 => null pointer
>
> Naturally this assumes that all pointers are at least 4-byte aligned (so
> that they can be masked off). I assume that this is sane on every
> platform that Python supports, but should it be spelled out explicitly
> somewhere in the PEP?
I'll change the PEP to move the type indicator into the state field, so
that issue becomes irrelevant.
>> The string is null-terminated (in its respective representation).
>> - hash, state: same as in Python 3.2
>> - utf8_length, utf8: UTF-8 representation (null-terminated)
> If this is to share its buffer with the "str" representation for the
> Latin-1 case, then I take it this ptr will typically be (str & ~4) ?
> i.e. only "str" has the low-order-bit type info.
Yes, the other pointers are aligned. Notice that the case in which
sharing occurs is only ASCII, though (for Latin-1, some characters
require two bytes in UTF-8).
> Spelling out the meaning of "optional":
> does this mean that the relevant ptr is NULL; if so, if utf8 is null,
> is utf8_length undefined, or is it some dummy value?
I've clarified this: I propose length is undefined (unless there is a
good reason to clear it).
>> If the string is created directly with the canonical representation
>> (see below), this representation doesn't take a separate memory block,
>> but is allocated right after the PyUnicodeObject struct.
>
> Is the idea to do pointer arithmentic when deleting the PyUnicodeObject
> to determine if the ptr is in that location, and not delete it if it is,
> or is there some other way of determining whether the pointers need
> deallocating?
Correct.
> If the former, is this embedding an assumption that the
> underlying allocator couldn't have allocated a buffer directly adjacent
> to the PyUnicodeObject. I know that GNU libc's malloc/free
> implementation has gaps of two machine words between each allocation;
> off the top of my head I'm not sure if the optimized Object/obmalloc.c
> allocator enforces such gaps.
No, it doesn't... So I guess I reserve another bit in the state for that.
> GDB Debugging Hooks
> -------------------
> Tools/gdb/libpython.py contains debugging hooks that embed knowledge
> about the internals of CPython's data types, include PyUnicodeObject
> instances. It will need to be slightly updated to track the change.
Thanks, added.
Regards,
Martin
More information about the Python-Dev
mailing list