[Python-Dev] Interning string subtype instances

Hrvoje Nikšić hrvoje.niksic at avl.com
Mon Feb 12 18:21:48 CET 2007


I propose modifying PyString_InternInPlace to better cope with string
subtype instances.

Current implementation of PyString_InternInPlace does nothing and
returns if passed an instance of a subtype of PyString_Type.  This is a
problem for applications that need to support string subtypes, but also
must intern the strings for faster equivalence testing.  Such an
application, upon receiving a string subtype, will silently fail to
work.

There is good reason for PyString_InternInPlace not accepting string
subtypes: since a subtype can have modified behavior, interning it can
cause problems for other users of the interned string.  I agree with the
reasoning, but propose a different solution: when interning an instance
of a string subtype, PyString_InternInPlace could simply intern a copy.

This should be a fully backward compatible change because: 1) code that
passes PyString instances (99.99% cases) will work as before, and 2)
code that passes something else silently failed to intern the string
anyway.  Speed should be exactly the same as before, with the added
benefit that interning PyString subtype instances now does something,
but without the problems that interning the actual instance can produce.

The patch could look like this.  If there is interest in this, I can
produce a complete patch.

@@ -5,10 +5,6 @@
 	PyObject *t;
 	if (s == NULL || !PyString_Check(s))
 		Py_FatalError("PyString_InternInPlace: strings only please!");
-	/* If it's a string subclass, we don't really know what putting
-	   it in the interned dict might do. */
-	if (!PyString_CheckExact(s))
-		return;
 	if (PyString_CHECK_INTERNED(s))
 		return;
 	if (interned == NULL) {
@@ -25,6 +21,18 @@
 		*p = t;
 		return;
 	}
+	/* Make sure we don't intern a string subclass, since we don't
+           really know what putting it in the interned dict might do. */
+	if (!PyString_CheckExact(s)) {
+		PyObject *copy;
+		copy = PyString_FromStringAndSize(PyString_AS_STRING(*p),
+		                                  PyString_GET_SIZE(*p));
+		if (!copy)
+			return;
+		Py_DECREF(*p);
+		*p = copy;
+		s = (PyStringObject *) copy;
+	}
 
 	if (PyDict_SetItem(interned, (PyObject *)s, (PyObject *)s) < 0) {
 		PyErr_Clear();




More information about the Python-Dev mailing list