[Python-Dev] Question about the current implementation of str

Serhiy Storchaka storchaka at gmail.com
Sat Apr 9 05:00:30 EDT 2016

On 09.04.16 10:52, Victor Stinner wrote:
> Le 9 avr. 2016 03:04, "Larry Hastings" <larry at hastings.org
> <mailto:larry at hastings.org>> a écrit :
>  > Although the str object is immutable from Python's perspective, the C
> object itself is mutable.  For example, for dynamically-created strings
> the hash field may be lazy-computed and cached inside the object.
> Yes, the hash is computed once on demand. It doesn't matter how you
> build the string.
>  > I was wondering if there were other fields like this.  For example,
> are there similar lazy-computed cached objects for the different encoded
> versions (utf8 utf16) of the str?
> Cached utf8 is only cached when you call the C functions filling this
> cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it.
> On Windows, there is a cache for wchar_t* which is utf16. This format is
> used by all C functions of the Windows API (Python should only use the
> Unicode flavor of the Windows API).
> I don't recall other caches.
>  > What would really help an exhaustive list of the fields of a str
> object that may ever change after the object's initial creation.
> I don't recall exactly what happens if a cache is created and then the
> string is modified. If I recall correctly, the cache is invalidated.

You must remember, some bugs with desynchronized utf8 and wchar_t* 
caches were fixed just few months ago.

> But the hash is used as an heuristic to decide if a string is
> "immutable" or not, the refcount is also used by the heuristic. If the
> string is immutable, an operation like resize must create a new string.
> You can document the PEP 393 in Include/unicodeobject.h.

In normal case the string object can be mutated only at creation time. 
But CPython uses some tricks that modifies already created strings if 
they have no external references and are not interned. For example "a += 
b" or "a = a + b" can resize the "a" string.

More information about the Python-Dev mailing list