[Patches] [ python-Patches-593627 ] Static names
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 15 Aug 2002 11:09:01 -0700
Patches item #593627, was opened at 2002-08-11 06:41
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=593627&group_id=5470
Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Oren Tirosh (orenti)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Static names
Initial Comment:
This patch creates static string objects for all built-in
names and interns then on initialization. The macro
PyNAME is be used to access static names.
PyNAME(__spam__) is equivalent to
PyString_InternFromString("__spam__") but is a
constant expression. It requires the name to be one of
the built-in names. A linker error will be generated if it
isn't.
Most conversions of C strings into temporary string
objects can be eliminated (PyString_FromString,
PyString_InternFromString). Most string comparisons at
runtime can also be eliminated.
Instead of :
if (strcmp(PyString_AsString(name), "__spam__")) ...
This code can be used:
PyString_INTERN(name)
if (name == PyNAME(__spam__)) ...
Where PyString_INTERN is a fast inline check if the
string is already interned (and it usually is). To prevent
unbounded accumulation of interned strings the mortal
interned string patch should also be applied.
The patch converts most of the builtin module to this
new mode as an example.
----------------------------------------------------------------------
>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-15 14:09
Message:
Logged In: YES
user_id=6380
> I'm surprised that there is *any* speed increase
> because I barely changed any code to make use
> this. This is very encouraging.
Don't get too excited. Speedups and slowdowns in
the order of 1% are usually random cache effects
having to do with common portions of the VM main
loop having a cache line conflict; I've seen a
case where adding an *unreachable* printf() call
predictably changed the pystone speed by 1%.
> The localization and forced recomplication
> issues you raise are not really relevant because
> this MUST NOT be used for anything but builtin
> names and builtins are not added so
> frequently. Even standard modules should not
> declare static names.
Then why do I see all signal names in your list?
And all exception names?
> Actually, the macro PyNAME is not required any
> more and the actual symbol name can be used. I
> used the macro to do typecasting but it's no
> longer necessary because I found a way to make
> the static names real PyObjects (probably the
> only place where something is actually defined
> as a PyObject!)
But the string is still more helpful in the code
than the symbol name.
Sorry, but none of this changes my position;
you'll hve to find another champion.
----------------------------------------------------------------------
Comment By: Oren Tirosh (orenti)
Date: 2002-08-15 13:45
Message:
Logged In: YES
user_id=562624
The code size increase is not surprising - all names appear
twice in the executable: once as C strings and again as
static PyStringObjects. This duplication can be eliminated.
I'm surprised that there is *any* speed increase because I
barely changed any code to make use this. This is very
encouraging.
The localization and forced recomplication issues you raise
are not really relevant because this MUST NOT be used for
anything but builtin names and builtins are not added so
frequently. Even standard modules should not declare static
names.
The interning of static strings must be done before the
interpreter is initialized to ensure that the static name is the
interned name. If you intern a static name after the same
name has already been interned elsewhere the static object
will not be the one true interned version and static references
to it will be incorrect.
Actually, the macro PyNAME is not required any more and
the actual symbol name can be used. I used the macro to do
typecasting but it's no longer necessary because I found a
way to make the static names real PyObjects (probably the
only place where something is actually defined as a
PyObject!)
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-15 11:58
Message:
Logged In: YES
user_id=6380
Strangely, I measured a code size *increase*. Strangely,
because most object files didn't increase in text size, but
the resulting binary did, adding about 20K text and 17K
data. The only object file that changed sizes at all
(according to "size */*.o" on Linux) was
Python/bltinmodule.o, which grew less than 500 bytes in text
size. The only new object file, Python/staticnames.o, has 48
bytes text and 17K bytes data. Maybe the added text size
could be because of more cross-file references added by the
linker??? (Files that referenced a local static char string
constant now reference a static object in Python/staticnames.o.)
I do see about a 1% speed increase for pystone.
But I agree with Martin's comments on the "readability"
issue. There's also a localization property that's lost:
whenever a new name is added, you must update staticnames.h,
staticnames.c, *and* the file where it is used. That's not
nice (and not just because it forces a recompilation of the
world because a header file was touched thst everybody
includes).
*If* this were ever accepted, the mechanism to (re)generate
staticnames.h automatically should be checked in as well.
In general, I've found that string literals hidden inside
macros using stringification (#) are a detriment to code
maintainability -- I've often had the situation where I
*knew* there had to be a string literal for some name
somewhere, but I couldn't find it because of this. Same for
name concatenation (##); it often means that you know
there's a function name somewhere but a grep through the
sources won't find it. Very painful when tracking down
problems. I deployed a bunch of tricks like this in early
versions of typeobject.c, and ended up expanding almost all
of them: a little more typing perhaps, but explicit is
better than implicit, and a search for slot_nb_add will at
least find the macro that defines it; ditto a search for
"__add__" (with the quotes) will find where it is used.
I guess that's about a -0.5 from me. Unless someone else
steps up to champion this soon, it's dead. :-)
PS. I used cvs add for new files and then cvs diff -c -N to
create diffs that include new files; cvs produces output
looking like a diff against /dev/null, and patch understands
those (see the fixed patch I uploaded). But maybe if you
only have anonymous CVS the cvs add won't work, and cvs diff
won't give you diffs for files it knows nothing about. IOW
YMMV. :-)
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-15 11:22
Message:
Logged In: YES
user_id=6380
Just for kicks I produced a forward diff, also adding the
necessary changes to Makefile.pre.in that were mysteriously
missing from the original, and fixing this for the very
latest CVS (up to and including Michael Hudson's set_lineno
patch).
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-08-12 08:48
Message:
Logged In: YES
user_id=21627
It is difficult to see how this patch achieves the goal of
readability:
- many variables are not used at all, e.g. ArithmethicError;
it is not clear how using them would improve readability.
- the change from
SETBUILTIN("None",
Py_None);
to
SETBUILTIN(PyNAME(None),
Py_None);
makes it more difficult to read, not easier. Furthermore,
the name "None" isn't used anywhere except this initialisation.
- likewise, the changes from
{"abs",
builtin_abs, METH_O, abs_doc},
to
{PyNAMEC(abs),
builtin_abs, METH_O, abs_doc},
make the code harder to read.
----------------------------------------------------------------------
Comment By: Oren Tirosh (orenti)
Date: 2002-08-12 08:10
Message:
Logged In: YES
user_id=562624
If these changes are applied throughout the interpreter I
expect a significant speedup but that is not my immediate
goal. I am looking for redability, reduction of code size (both
source and binary) and reliability (less things to check or
forget to check).
I am trying to get rid of code like
if (docstr == NULL) {
docstr= PyString_InternFromString("__doc__");
if (docstr == NULL)
return NULL;
And replace it with just PyNAME(__doc__)
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-08-12 03:43
Message:
Logged In: YES
user_id=21627
What is the rationale for this patch? If it is for
performance, what real speed improvements can you report?
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-11 11:54
Message:
Logged In: YES
user_id=33168
I'd like to see the releasing interned string patch applied.
I think it's almost ready, isn't it? It would make
patching easier and seems to be a good idea.
For me, the easiest way to produce patches is to use cvs.
You can keep multiple cvs trees around easy enough (for
having multiple overlapping/independant patches). To create
patches with cvs:
cvs diff -C 5 [file1] [file2] ....
----------------------------------------------------------------------
Comment By: Oren Tirosh (orenti)
Date: 2002-08-11 11:44
Message:
Logged In: YES
user_id=562624
Ok, I'll fix the problems with the patch. What's the best way
to produce a patch that adds new files?
Static string objects cannot be released, of course. This
patch will eventually depend on on the mortal interned strings
patch to fix it but in the meantime I just disabled releasing
interned strings because I want to keep the two patches
independent.
The next version will add a new PyArg_ParseTuple format for
an interned string to make it easier to use.
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-11 10:46
Message:
Logged In: YES
user_id=33168
Couple of initial comments:
* this is a reverse patch
* it seems like there are other changes in here
- int ob_shash -> long
- releasing interned strings?
* dictobject.c is removed?
* including python headers should use "" not <>
Oren, could you generate a new patch with only the changes
to support PyNAME? Thanks!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=593627&group_id=5470