[Python-Dev] Disabling string interning for null and single-char causes segfaults

Sat Mar 2 21:32:02 CET 2013

On 3/2/2013 10:08 AM, Nick Coghlan wrote:
> On Sat, Mar 2, 2013 at 1:24 AM, Stefan Bucur <stefan.bucur at gmail.com> wrote:
>> Hi,
>>
>> I'm working on an automated bug finding tool that I'm trying to apply on the
>> Python interpreter code (version 2.7.3). Because of early prototype
>> limitations, I needed to disable string interning in stringobject.c. More
>> precisely, I modified the PyString_FromStringAndSize and PyString_FromString
>> to no longer check for the null and single-char cases, and create instead a
>> new string every time (I can send the patch if needed).
>>
>> However, after applying this modification, when running "make test" I get a
>> segfault in the test___all__ test case.
>>
>> Before digging deeper into the issue, I wanted to ask here if there are any
>> implicit assumptions about string identity and interning throughout the
>> interpreter implementation. For instance, are two single-char strings having
>> the same content supposed to be identical objects?
>>
>> I'm assuming that it's either this, or some refcount bug in the interpreter
>> that manifests only when certain strings are no longer interned and thus
>> have a higher chance to get low refcount values.
>
> In theory, interning is supposed to be a pure optimisation, but it
> wouldn't surprise me if there are cases that assume the described
> strings are always interned (especially the null string case). Our
> test suite would never detect such bugs, as we never disable the
> interning.

Since it required patching functions rather than a configuration switch, 
it literally seems not be a supported option. If so, I would not 
consider it a bug for CPython to use the assumption of interning to run 
faster and I don't think it should be slowed down if that would be 
necessary to remove the assumption. (This is all assuming that the 
problem is not just a ref count bug.)

Stefan's question was about 2.7. I am just curious: does 3.3 still 
intern (some) unicode chars? Did the 256 interned bytes of 2.x carry 
over to 3.x?

> Whether or not we're interested in fixing such bugs would depend on
> the size of the patches needed to address them. From our point of
> view, such bugs are purely theoretical (as the assumption is always
> valid in an unpatched CPython build), so if the problem is too hard to
> diagnose or fix, we're more likely to declare that interning of at
> least those kinds of string values is required for correctness when
> creating modified versions of CPython.

-- 
Terry Jan Reedy