[Python-Dev] Question about the current implementation of str
Nick Coghlan
ncoghlan at gmail.com
Sat Apr 9 03:18:10 EDT 2016
On 9 April 2016 at 10:56, Larry Hastings <larry at hastings.org> wrote:
>
>
> I have a straightforward question about the str object, specifically the
> PyUnicodeObject. I've tried reading the source to answer the question
> myself but it's nearly impenetrable. So I was hoping someone here who
> understands the current implementation could answer it for me.
>
> Although the str object is immutable from Python's perspective, the C object
> itself is mutable. For example, for dynamically-created strings the hash
> field may be lazy-computed and cached inside the object. I was wondering if
> there were other fields like this. For example, are there similar
> lazy-computed cached objects for the different encoded versions (utf8 utf16)
> of the str? What would really help an exhaustive list of the fields of a
> str object that may ever change after the object's initial creation.
https://www.python.org/dev/peps/pep-0393/#specification should have
most of the relevant details.
Aside from the hash and the interned-or-not flag in the state, most
things should be locked once the string is ready, except that
generating the utf-8 and wchar_t forms is deferred until they're
needed if they're not the same as the canonical form.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev
mailing list