[Python-Dev] Alternative implementation of interning, take 2
Tim Peters
tim.one@comcast.net
Fri, 12 Jul 2002 13:22:18 -0400
This is a multi-part message in MIME format.
--Boundary_(ID_RM57m/IJOIS03WJjE8ryaQ)
Content-type: text/plain; charset=Windows-1252
Content-transfer-encoding: 7BIT
[M.-A. Lemburg]
> If you could spell out what exactly you mean by "indirect interning"
> that would help.
Actually, I don't think it would -- the issue is whether the possibility for
the ob_sinterned member of a PyStringObject not to *be* the string object
itself ever saves time in your extensions, and it's darned hard to guess
that. If you apply the attached patch to current CVS, though, it will tell
you whenever your code benefits from it.
AFAICT, there are only 3 routines where it *might* save cycles (but note
that checking for the possibility costs cycles whether or not it pays; it's
a net loss when it doesn't pay):
+ PyDict_SetItem: I believe this is the only real possibility for gain. If
it ever helps you here, the patch arranges to print
ii paid on a setitem
to stderr whenever it does pay. I haven't yet seen that get printed.
+ PyString_InternInPlace: Whenever it pays here, the patch spits
ii paid on an InternInPlace
That triggers 6 times in the Python test suite, all from test_descr. Since
this one is an optimization *of* setting ob_sinterned, it's a
snake-eating-its-tail kind of thing -- it's of no real benefit unless
ob_sintered pays off somewhere else too.
+ string_hash: The patch spits
ii paid on a hash???
The question marks are there because I don't see how it's possible for this
to get printed.
> What I do need and rely on is the fact that the
> Python compiler interns all constant strings and identifiers in
> Python programs. This makes switching like so:
Ya, while that's evil, it's not affected by indirect interning.
--Boundary_(ID_RM57m/IJOIS03WJjE8ryaQ)
Content-type: text/plain; name=ii.txt
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=ii.txt
Index: Objects/dictobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.126
diff -c -c -r2.126 dictobject.c
*** Objects/dictobject.c 13 Jun 2002 20:32:57 -0000 2.126
--- Objects/dictobject.c 12 Jul 2002 17:14:19 -0000
***************
*** 512,517 ****
--- 512,519 ----
mp = (dictobject *)op;
if (PyString_CheckExact(key)) {
if (((PyStringObject *)key)->ob_sinterned != NULL) {
+ if (key != ((PyStringObject *)key)->ob_sinterned)
+ fprintf(stderr, "ii paid on a setitem\n");
key = ((PyStringObject *)key)->ob_sinterned;
hash = ((PyStringObject *)key)->ob_shash;
}
Index: Objects/stringobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v
retrieving revision 2.169
diff -c -c -r2.169 stringobject.c
*** Objects/stringobject.c 11 Jul 2002 06:23:50 -0000 2.169
--- Objects/stringobject.c 12 Jul 2002 17:14:20 -0000
***************
*** 925,933 ****
if (a->ob_shash != -1)
return a->ob_shash;
! if (a->ob_sinterned != NULL)
return (a->ob_shash =
((PyStringObject *)(a->ob_sinterned))->ob_shash);
len = a->ob_size;
p = (unsigned char *) a->ob_sval;
x = *p << 7;
--- 925,940 ----
if (a->ob_shash != -1)
return a->ob_shash;
! if (a->ob_sinterned != NULL) {
! if ((PyObject *)a != a->ob_sinterned)
! /* This shouldn't be possible? 'a' would have
! * had its ob_shash set as part of a->ob_sinterned
! * getting set.
! */
! fprintf(stderr, "ii paid on a hash???\n");
return (a->ob_shash =
((PyStringObject *)(a->ob_sinterned))->ob_shash);
+ }
len = a->ob_size;
p = (unsigned char *) a->ob_sval;
x = *p << 7;
***************
*** 3829,3834 ****
--- 3836,3842 ----
if ((t = s->ob_sinterned) != NULL) {
if (t == (PyObject *)s)
return;
+ fprintf(stderr, "ii paid on an InternInPlace\n");
Py_INCREF(t);
*p = t;
Py_DECREF(s);
--Boundary_(ID_RM57m/IJOIS03WJjE8ryaQ)--