[New-bugs-announce] [issue28406] Possible PyUnicode_InternInPlace() edge-case bug

Larry Hastings report at bugs.python.org
Mon Oct 10 11:41:22 EDT 2016


New submission from Larry Hastings:

A visual inspection of PyUnicode_InternInPlace() suggests there might be a rare edge-case bug lurking there.

Specifically, the bug has to do with "interned mortal" strings.  Interned mortal strings are stored in the static "interned" dictionary, with their key and value set to the same string.  (If "x" is interned, then in the interned dictionary "x" is set to "x".)  Under normal circumstances, these two references would keep the string alive forever.  However, in order for the string to be *mortal*, the unicode object then DECREFs the string twice so the references in the "interned" dict no longer "count".  When the refcnt reaches 0, and the string is being destroyed, unicode_dealloc() temporarily resurrects the object, bumping its reference count up to 3.  It then removes the string from the interned dict and destroys it as normal.

The actual intern substitution happens in a function called PyUnicode_InternInPlace().  This function looks the string up in the "interned" dictionary, and if the string is there it substitutes in the interned version from the dictionary.  There's a curious comment in that function:

    /* It might be that the GetItem call fails even
       though the key is present in the dictionary,
       namely when this happens during a stack overflow. */
    Py_ALLOW_RECURSION
    t = PyDict_GetItem(interned, s);
    Py_END_ALLOW_RECURSION

If t is NULL, then PyUnicode_InternInPlace() goes on to set t in the interned dictionary, reduces the refcnt by 2, etc etc.

The problem: if t is in the dictionary, but PyDict_GetItem() fails (during a stack overflow?), then the subsequent PyDict_SetItem() will *overwrite* the existing interned string.  Which means the "interned" dict will drop its two references, which means two REFCNTs.  If there were only 1 or 2 extant references to this string, it means the string will be immediately dealloc'd, *even though the string was still in use*.

I suppose this is already such a speculative condition that it's probably not worrying over.  If PyDict_GetItem fails due to stack overflow, perhaps the Python process is doomed to fail soon anyway.  Perhaps PyDict_SetItem has no chance of working.  But it *seems* like this code is intended to function correctly even when PyDict_GetItem fails due to stack overflow, and I suspect it won't.

----------
components: Interpreter Core
messages: 278421
nosy: larry
priority: low
severity: normal
stage: test needed
status: open
title: Possible PyUnicode_InternInPlace() edge-case bug
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28406>
_______________________________________


More information about the New-bugs-announce mailing list