Surrogate pairs in new flexible string representation
christian at python.org
Fri Mar 29 23:05:12 CET 2013
Am 29.03.2013 07:22, schrieb Ian Kelly:
> Since the PEP specifically mentions ParseTuple string conversion, I am
> thinking that this is probably the motivation for caching it. A
> string that is passed into a C function (that uses one of the various
> UTF-8 char* format specifiers) is perhaps likely to be passed into
> that function again at some point, so the UTF-8 representation is kept
> around to avoid the need to recompose it at on each call.
It's not just about caching but also about memory management. The
additional utf8 member is required for backward compatibility. The APIs
expect a pointer to an existing and shared block of memory. They don't
take ownership of the memory block and therefore don't free() it.
More information about the Python-list