[Python-Dev] Question about the current implementation of str
Victor Stinner
victor.stinner at gmail.com
Sat Apr 9 03:52:24 EDT 2016
Le 9 avr. 2016 03:04, "Larry Hastings" <larry at hastings.org> a écrit :
> Although the str object is immutable from Python's perspective, the C
object itself is mutable. For example, for dynamically-created strings the
hash field may be lazy-computed and cached inside the object.
Yes, the hash is computed once on demand. It doesn't matter how you build
the string.
> I was wondering if there were other fields like this. For example, are
there similar lazy-computed cached objects for the different encoded
versions (utf8 utf16) of the str?
Cached utf8 is only cached when you call the C functions filling this
cache. The Python str.encode('utf8') doesn't fill the cache, but it uses it.
On Windows, there is a cache for wchar_t* which is utf16. This format is
used by all C functions of the Windows API (Python should only use the
Unicode flavor of the Windows API).
I don't recall other caches.
> What would really help an exhaustive list of the fields of a str object
that may ever change after the object's initial creation.
I don't recall exactly what happens if a cache is created and then the
string is modified. If I recall correctly, the cache is invalidated.
But the hash is used as an heuristic to decide if a string is "immutable"
or not, the refcount is also used by the heuristic. If the string is
immutable, an operation like resize must create a new string.
You can document the PEP 393 in Include/unicodeobject.h.
Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160409/74e9e0f4/attachment.html>
More information about the Python-Dev
mailing list