[Patches] [ python-Patches-593627 ] Static names

noreply@sourceforge.net noreply@sourceforge.net
Thu, 15 Aug 2002 11:09:01 -0700


Patches item #593627, was opened at 2002-08-11 06:41
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=593627&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Oren Tirosh (orenti)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Static names

Initial Comment:
This patch creates static string objects for all built-in 
names and interns then on initialization. The macro 
PyNAME is be used to access static names.

PyNAME(__spam__) is equivalent to 
PyString_InternFromString("__spam__") but is a 
constant expression. It requires the name to be one of 
the built-in names. A linker error will be generated if it 
isn't.

Most conversions of C strings into temporary string 
objects can be eliminated (PyString_FromString, 
PyString_InternFromString). Most string comparisons at 
runtime can also be eliminated. 

Instead of :
if (strcmp(PyString_AsString(name), "__spam__")) ...

This code can be used:
PyString_INTERN(name)
if (name == PyNAME(__spam__)) ...

Where PyString_INTERN is a fast inline check if the 
string is already interned (and it usually is). To prevent 
unbounded accumulation of interned strings the mortal 
interned string patch should also be applied.

The patch converts most of the builtin module to this 
new mode as an example.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-15 14:09

Message:
Logged In: YES 
user_id=6380

> I'm surprised that there is *any* speed increase
> because I barely changed any code to make use
> this. This is very encouraging.

Don't get too excited.  Speedups and slowdowns in
the order of 1% are usually random cache effects
having to do with common portions of the VM main
loop having a cache line conflict; I've seen a
case where adding an *unreachable* printf() call
predictably changed the pystone speed by 1%.

> The localization and forced recomplication
> issues you raise are not really relevant because
> this MUST NOT be used for anything but builtin
> names and builtins are not added so
> frequently. Even standard modules should not
> declare static names.

Then why do I see all signal names in your list?
And all exception names?

> Actually, the macro PyNAME is not required any
> more and the actual symbol name can be used. I
> used the macro to do typecasting but it's no
> longer necessary because I found a way to make
> the static names real PyObjects (probably the
> only place where something is actually defined
> as a PyObject!)

But the string is still more helpful in the code
than the symbol name.

Sorry, but none of this changes my position;
you'll hve to find another champion.


----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-08-15 13:45

Message:
Logged In: YES 
user_id=562624

The code size increase is not surprising - all names appear 
twice in the executable: once as C strings and again as 
static PyStringObjects. This duplication can be eliminated.

I'm surprised that there is *any* speed increase because I 
barely changed any code to make use this. This is very 
encouraging.

The localization and forced recomplication issues you raise 
are not really relevant because this MUST NOT be used for 
anything but builtin names and builtins are not added so 
frequently. Even standard modules should not declare static 
names.

The interning of static strings must be done before the 
interpreter is initialized to ensure that the static name is the 
interned name. If you intern a static name after the same 
name has already been interned elsewhere the static object 
will not be the one true interned version and static references 
to it will be incorrect.

Actually, the macro PyNAME is not required any more and 
the actual symbol name can be used. I used the macro to do 
typecasting but it's no longer necessary because I found a 
way to make the static names real PyObjects (probably the 
only place where something is actually defined as a 
PyObject!)


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-15 11:58

Message:
Logged In: YES 
user_id=6380

Strangely, I measured a code size *increase*. Strangely,
because most object files didn't increase in text size, but
the resulting binary did, adding about 20K text and 17K
data. The only object file that changed sizes at all
(according to "size */*.o" on Linux) was
Python/bltinmodule.o, which grew less than 500 bytes in text
size. The only new object file, Python/staticnames.o, has 48
bytes text and 17K bytes data. Maybe the added text size
could be because of more cross-file references added by the
linker??? (Files that referenced a local static char string
constant now reference a static object in Python/staticnames.o.)

I do see about a 1% speed increase for pystone.

But I agree with Martin's comments on the "readability"
issue. There's also a localization property that's lost:
whenever a new name is added, you must update staticnames.h,
staticnames.c, *and* the file where it is used. That's not
nice (and not just because it forces a recompilation of the
world because a header file was touched thst everybody
includes).

*If* this were ever accepted, the mechanism to (re)generate
staticnames.h automatically should be checked in as well.

In general, I've found that string literals hidden inside
macros using stringification (#) are a detriment to code
maintainability -- I've often had the situation where I
*knew* there had to be a string literal for some name
somewhere, but I couldn't find it because of this. Same for
name concatenation (##); it often means that you know
there's a function name somewhere but a grep through the
sources won't find it. Very painful when tracking down
problems. I deployed a bunch of tricks like this in early
versions of typeobject.c, and ended up expanding almost all
of them: a little more typing perhaps, but explicit is
better than implicit, and a search for slot_nb_add will at
least find the macro that defines it; ditto a search for
"__add__" (with the quotes) will find where it is used.

I guess that's about a -0.5 from me. Unless someone else
steps up to champion this soon, it's dead. :-)

PS. I used cvs add for new files and then cvs diff -c -N to
create diffs that include new files; cvs produces output
looking like a diff against /dev/null, and patch understands
those (see the fixed patch I uploaded). But maybe if you
only have anonymous CVS the cvs add won't work, and cvs diff
won't give you diffs for files it knows nothing about. IOW
YMMV. :-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-15 11:22

Message:
Logged In: YES 
user_id=6380

Just for kicks I produced a forward diff, also adding the
necessary changes to  Makefile.pre.in that were mysteriously
missing from the original, and fixing this for the very
latest CVS (up to and including Michael Hudson's set_lineno
patch).



----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-08-12 08:48

Message:
Logged In: YES 
user_id=21627

It is difficult to see how this patch achieves the goal of
readability:

- many variables are not used at all, e.g. ArithmethicError;
it is not clear how using them would improve readability.

- the change from
 	SETBUILTIN("None",
	Py_None);
   to
 	SETBUILTIN(PyNAME(None),
	Py_None);

   makes it more difficult to read, not easier. Furthermore,
the name "None" isn't used anywhere except this initialisation.

- likewise, the changes from
  	{"abs",
	builtin_abs,        METH_O, abs_doc},
   to
  	{PyNAMEC(abs),
	builtin_abs,        METH_O, abs_doc},

make the code harder to read.


----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-08-12 08:10

Message:
Logged In: YES 
user_id=562624

If these changes are applied throughout the interpreter I 
expect a significant speedup but that is not my immediate 
goal. I am looking for redability, reduction of code size (both 
source and binary) and reliability (less things to check or 
forget to check).

I am trying to get rid of code like

         if (docstr == NULL) {
                docstr= PyString_InternFromString("__doc__");
                if (docstr == NULL)
                        return NULL;
 
And replace it with just PyNAME(__doc__)


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-08-12 03:43

Message:
Logged In: YES 
user_id=21627

What is the rationale for this patch? If it is for
performance, what real speed improvements can you report?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-11 11:54

Message:
Logged In: YES 
user_id=33168

I'd like to see the releasing interned string patch applied.
 I think it's almost ready, isn't it?  It would make
patching easier and seems to be a good idea.

For me, the easiest way to produce patches is to use cvs. 
You can keep multiple cvs trees around easy enough (for
having multiple overlapping/independant patches).  To create
patches with cvs:

  cvs diff -C 5 [file1] [file2] ....

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-08-11 11:44

Message:
Logged In: YES 
user_id=562624

Ok, I'll fix the problems with the patch.  What's the best way 
to produce a patch that adds new files? 

Static string objects cannot be released, of course. This 
patch will eventually depend on on the mortal interned strings 
patch to fix it but in the meantime I just disabled releasing 
interned strings because I want to keep the two patches 
independent.

The next version will add a new PyArg_ParseTuple format for 
an interned string to make it easier to use.


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-11 10:46

Message:
Logged In: YES 
user_id=33168

Couple of initial comments:
 * this is a reverse patch
 * it seems like there are other changes in here
    - int ob_shash -> long
    - releasing interned strings?
 * dictobject.c is removed?
 * including python headers should use "" not <>

Oren, could you generate a new patch with only the changes
to support PyNAME?  Thanks!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=593627&group_id=5470