Mailman 3 opcode performance measurements - Python-Dev

opcode performance measurements

Jeremy Hylton

31 Jan 2002 31 Jan '02

12:25 a.m.

I've made some simple measurements of how long opcodes take to execute and how long it takes to go around the mainloop, using the Pentim timestamp counter, which measures processor cycles. The results aren't particularly surprising, but they provide some empirical validation of what we've believed all along. I don't have time to go into all the gory details here, though I plan to at Spam 10 developers day next week. I put together a few Web pages that summarize the data I've collected on some simple benchmarks: http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/PerformanceMeasu... Comments and questions are welcome. I've got a little time to do more measurement and analysis before devday. Jeremy

Show replies by date

Skip Montanaro

31 Jan 31 Jan

10:37 a.m.

Jeremy> I've made some simple measurements of how long opcodes take to Jeremy> execute and how long it takes to go around the mainloop ... Jeremy> Comments and questions are welcome. I've got a little time to Jeremy> do more measurement and analysis before devday. Interesting results. I've been working on my {TRACK,UNTRACK}_GLOBAL opcode implementations. I have an optimizer filter that sets up tracking for all LOAD_GLOBAL,{LOAD_ATTR}* combinations. It's still not quite working and will only be a proof of concept by devday if I do get it working, but I expect most of these expensive opcode combinations to collapse into a LOAD_FAST, with the addition of a TRACK_GLOBAL/UNTRACK_GLOBAL pair executed at function start and end, respectively. Skip

Jeremy Hylton

2:14 a.m.

...

...
...
...
...
"SM" == Skip Montanaro <skip@pobox.com> writes:

SM> Interesting results. I've been working on my SM> {TRACK,UNTRACK}_GLOBAL opcode implementations. I have an SM> optimizer filter that sets up tracking for all SM> LOAD_GLOBAL,{LOAD_ATTR}* combinations. It's still not quite SM> working and will only be a proof of concept by devday if I do SM> get it working, but I expect most of these expensive opcode SM> combinations to collapse into a LOAD_FAST, with the addition of SM> a TRACK_GLOBAL/UNTRACK_GLOBAL pair executed at function start SM> and end, respectively. I won't have any implementation done at all, but should have finished the design for LOAD_FAST-style access to globals and module attributes. I also have some ideas about Python bytecode specializer that would work essentially like a JIT but generated specialized bytecode instead of machine code. Jeremy PS Skip-- Sorry the PEP isn't clear, but the only dictionary lookups that need to occur are at function creation time. MAKE_FUNCTION would need to lookup the offsets of the globals used by the functions, so that a LOAD_FAST_GLOBAL opcode would take an int argument.

Jeff Epler

12:17 p.m.

On Thu, Jan 31, 2002 at 05:14:28AM -0500, Jeremy Hylton wrote:

...

PS Skip-- Sorry the PEP isn't clear, but the only dictionary lookups that need to occur are at function creation time. MAKE_FUNCTION would need to lookup the offsets of the globals used by the functions, so that a LOAD_FAST_GLOBAL opcode would take an int argument.

So how does this work for mutually-recursive functions? def f(x): if x==1: return 1 return g(x) def g(x): return x * f(x-1) can f not optimize the load of the global g into a LOAD_FAST_GLOBAL? Jeff PS Which PEP? I only see 266

Jeremy Hylton

3:02 a.m.

...

...
...
...
...
"JE" == Jeff Epler <jepler@unpythonic.dhs.org> writes:

JE> On Thu, Jan 31, 2002 at 05:14:28AM -0500, Jeremy Hylton wrote:

...

...
PS Skip-- Sorry the PEP isn't clear, but the only dictionary lookups that need to occur are at function creation time. MAKE_FUNCTION would need to lookup the offsets of the globals used by the functions, so that a LOAD_FAST_GLOBAL opcode would take an int argument.

JE> So how does this work for mutually-recursive functions? JE> def f(x): JE> if x==1: return 1 return g(x) JE> def g(x): return x * f(x-1) JE> can f not optimize the load of the global g into a JE> LOAD_FAST_GLOBAL? JE> PS Which PEP? I only see 266 PEP 267. (They gesture at each other.) So you've got a module with two globals f() and g(). They're stored in slots 0 and 1 of the module globals array. When f() and g() are compiled, the symbol table for the module can note the location of f() and g() and that f() and g() contain references to globals. Instead of emitting LOAD_GLOBAL "f" in g(), you can emit LOAD_GLOBAL 0 ("f"). The complication here is that a code object isn't tied to a single module. It would be possible to to exec f.func_code in some other environment where "g" was not stored in the module global array. The dictionary lookups may occur in MAKE_FUNCTION in order to verify that the code object and the module object agree on the layout of the globals array. Jeremy

jepler＠unpythonic.dhs.org

2:54 p.m.

On Thu, Jan 31, 2002 at 06:02:17AM -0500, Jeremy Hylton wrote:

...

JE> can f not optimize the load of the global g into a JE> LOAD_FAST_GLOBAL?

So you've got a module with two globals f() and g(). They're stored in slots 0 and 1 of the module globals array. When f() and g() are compiled, the symbol table for the module can note the location of f() and g() and that f() and g() contain references to globals. Instead of emitting LOAD_GLOBAL "f" in g(), you can emit LOAD_GLOBAL 0 ("f").

But isn't what happens in this module something like LOAD_CONST <code1> MAKE_FUNCTION STORE_GLOBAL 0 (f) LOAD_CONST <code2> MAKE_FUNCTION STORE_GLOBAL 1 (g) so if you convert LOAD_GLOBAL into LOAD_FAST_GLOBAL when you MAKE_FUNCTION on code1, there is not yet a "g" in the dlict. Are you populating the "names" part of the dlict as an earlier "pass" of module compilation, then? So the optimization doesn't apply if I create the globals from within a function? (Of course, in that case it would work if I set the attributes to 'None' in the module scope, right?): def make_fg(): global f, g def f(x): pass def g(x): pass Jeff

Jeremy Hylton

5:09 a.m.

...

...
...
...
...
"JE" == jepler <jepler@unpythonic.dhs.org> writes:

JE> But isn't what happens in this module something like JE> LOAD_CONST <code1> MAKE_FUNCTION STORE_GLOBAL 0 (f) JE> LOAD_CONST <code2> MAKE_FUNCTION STORE_GLOBAL 1 (g) JE> so if you convert LOAD_GLOBAL into LOAD_FAST_GLOBAL when you JE> MAKE_FUNCTION on code1, there is not yet a "g" in the dlict. JE> Are you populating the "names" part of the dlict as an earlier JE> "pass" of module compilation, then? Yes. The compiler can do a pretty good job of establishing all the globals in a module at compile time. When a module is loaded, the interpreter would allocate space for all the expected globals. JE> "pass" of module compilation, then? So the optimization doesn't JE> apply if I create the globals from within a function? It still applies. The function is compiled at the same time as the module, so the module symbol table can account for globals assigned to only in functions. The harder cases are much more dynamic -- exec using a module's globals, assignment to create new attributes on an imported module, etc. Example: import foo assert not hasattr(foo, 'bar') # just to illustrate the example foo.bar = 12 There's no way for the compiler to know that foo will have a bar attribute when it compiles foo. Jeremy

Skip Montanaro

12:29 p.m.

Jeff> PS Which PEP? I only see 266 Mine is 266. Jeremy's is 267. Skip

Samuele Pedroni

12:30 p.m.

Hi. Q about PEP 267 Does the PEP mechanims adress only import a use a.x cases. How does it handle things like import a.b use a.b.x Thanks, Samuele Pedroni.

Jeremy Hylton

3:12 a.m.

...

...
...
...
...
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:

SP> Hi. Q about PEP 267 Does the PEP mechanims adress only SP> import a SP> use a.x SP> cases. How does it handle things like SP> import a.b SP> use a.b.x You're a smart guy, can you tell me? :-). Seriously, I haven't gotten that far. import mod.sub creates a binding for "mod" in the global namespace The compiler can detect that the import statement is a package import -- and mark "mod.sub" as a candidate for optimization. A use of "mod.sub.attr" in function should be treated just as "mod.attr". The globals array (dict-list hybrid, technically) has the publicly visible binding for "mod" but also has an internal binding for "mod.sub" and "mod.sub.attr". Every module or submodule attribute in a function gets an internal slot in the globals. The internal slot gets initialized the first time it is used and then shared by all the functions in the module. So I think this case isn't special enough to need a special case. Jeremy

Samuele Pedroni

1:02 p.m.

From: Jeremy Hylton <jeremy@alum.mit.edu>

...

...
...
...
...
...
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:

SP> Hi. Q about PEP 267 Does the PEP mechanims adress only SP> import a SP> use a.x

SP> cases. How does it handle things like SP> import a.b SP> use a.b.x

You're a smart guy, can you tell me? :-). Seriously, I haven't gotten that far.

import mod.sub creates a binding for "mod" in the global namespace

The compiler can detect that the import statement is a package import -- and mark "mod.sub" as a candidate for optimization. A use of "mod.sub.attr" in function should be treated just as "mod.attr".

The globals array (dict-list hybrid, technically) has the publicly visible binding for "mod" but also has an internal binding for "mod.sub" and "mod.sub.attr". Every module or submodule attribute in a function gets an internal slot in the globals. The internal slot gets initialized the first time it is used and then shared by all the functions in the module.

So I think this case isn't special enough to need a special case.

OK, I stated the wrong question. What happens if I do the following: import a.b def f(): print a.b.x a.g() print a.b.x f() Now a.g() change a.b from a submodule to an object with a x attribute. Maybe this case does not make sense, but the point is that the PEP is quite vague about imported stuff. Samuele (more puzzled than smart).

Skip Montanaro

1:16 p.m.

SP> cases. How does it handle things like SP> import a.b SP> use a.b.x Jeremy> You're a smart guy, can you tell me? :-). Seriously, I haven't Jeremy> gotten that far. My stuff does handle this, as long as the first name is global. It just gobbles up all LOAD_GLOBALS and any immediately following LOAD_ATTRs. For instance, this trivial function: def f(): return distutils.core.setup compiles to: 0 LOAD_GLOBAL 0 (distutils) 3 LOAD_ATTR 1 (core) 6 LOAD_ATTR 2 (setup) 9 RETURN_VALUE 10 LOAD_CONST 0 (None) 13 RETURN_VALUE My TrackGlobalOptimizer class currently transforms this to 0 SETUP_FINALLY 11 (to 14) 3 TRACK_GLOBAL 3 (distutils.core.setup, distutils.core.setup) 6 POP_BLOCK 7 LOAD_CONST 0 (None) 10 LOAD_FAST 0 (distutils.core.setup) 13 RETURN_VALUE

...

...
14 UNTRACK_GLOBAL 3 (distutils.core.setup, distutils.core.setup) 17 END_FINALLY 18 LOAD_CONST 0 (None) 21 RETURN_VALUE

which is obviously not an improvement because distutils.core.setup is only accessed once. As people make more use of packages, such multiple attribute loads might become more common. Skip

Gordon McMillan

4:51 p.m.

On 31 Jan 2002 at 6:12, Jeremy Hylton wrote:

...

import mod.sub creates a binding for "mod" in the global namespace

The compiler can detect that the import statement is a package import -- and mark "mod.sub" as a candidate for optimization. A use of "mod.sub.attr" in function should be treated just as "mod.attr".

How can the compiler tell it's a package import? It's bad practice, but people write "import mod.attr" all the time. Heck, Marc-Andre tricks import so that pkg.mod is really pkg.attr where the attr turns into a mod when accessed. No problem, since it's only import that cares what it is. By the time it's used it's always global.attr.attr.... -- Gordon http://www.mcmillan-inc.com/

Jeremy Hylton

5:13 p.m.

...

...
...
...
...
"GM" == Gordon McMillan <gmcm@hypernet.com> writes:

GM> On 31 Jan 2002 at 6:12, Jeremy Hylton wrote:

...

...
import mod.sub creates a binding for "mod" in the global namespace

The compiler can detect that the import statement is a package import -- and mark "mod.sub" as a candidate for optimization. A use of "mod.sub.attr" in function should be treated just as "mod.attr".

GM> How can the compiler tell it's a package import? I'm assuming it can guess based on import statements and that a runtime check in LOAD_GLOBAL_ATTR (or whatever it's called) can verify this assumption. I haven't thought this part through fully, because I'm not aware of the full perversity of what people do with import hooks. GM> It's bad practice, but people write "import mod.attr" all the GM> time. I write it all the time when attr is a module in a package. And I know I can't do it for an actual attr of module. GM> Heck, Marc-Andre tricks import so that pkg.mod is really GM> pkg.attr where the attr turns into a mod when accessed. No GM> problem, since it's only import that cares what it is. By the GM> time it's used it's always global.attr.attr.... Not sure I understand what Marc-Andre is doing. (That's probably true in general <wink>.) A client of his code types "import foo.bar." foo is a module? a package? When the "bar" attribute is loaded (LOAD_ATTR) is turns into another module? Jeremy

Gordon McMillan

7:30 p.m.

On 31 Jan 2002 at 20:13, Jeremy Hylton wrote:

...

GM> How can the compiler tell it's a package import?

I'm assuming it can guess based on import statements and that a runtime check in LOAD_GLOBAL_ATTR (or whatever it's called) can verify this assumption. I haven't thought this part through fully, because I'm not aware of the full perversity of what people do with import hooks.

Import hooks are effectively dead. People play namespace games almost exclusively now.

...

GM> It's bad practice, but people write "import mod.attr" all the GM> time.

I write it all the time when attr is a module in a package. And I know I can't do it for an actual attr of module.

import os.path works even though there's no module named path. import pkg.attr always works.

...

GM> Heck, Marc-Andre tricks import so that pkg.mod is really GM> pkg.attr where the attr turns into a mod when accessed. No GM> problem, since it's only import that cares what it is. By the GM> time it's used it's always global.attr.attr....

Not sure I understand what Marc-Andre is doing. (That's probably true in general <wink>.) A client of his code types "import foo.bar." foo is a module? a package? When the "bar" attribute is loaded (LOAD_ATTR) is turns into another module?

foo is a package. The __init__.py creates an instance of LazyModule named bar. Doing anything with foo.bar triggers an import, and replacment of the name "bar" in foo with module bar. That one's clean. Now turn your eye on the shennanigans in PyXML. -- Gordon http://www.mcmillan-inc.com/

Skip Montanaro

1 Feb 1 Feb

6:51 a.m.

It just occurred to me that my LOAD_GLOBAL/LOAD_ATTR eliding scheme can't work, since LOAD_ATTR calls PyObject_GetAttr, which can wind up calling __getattr__, which is free to inflict all sorts of side effects on the attribute lookup. PEP 267 doesn't appear to be similarly affected, assuming it can conclude that LOAD_GLOBAL is actually loading a module object. (Can it?) LOAD_GLOBAL alone shouldn't be a problem, since all that does is call PyDict_GetItem for globals and builtins. damn... Skip

Jeremy Hylton

7:10 a.m.

...

It just occurred to me that my LOAD_GLOBAL/LOAD_ATTR eliding scheme can't work, since LOAD_ATTR calls PyObject_GetAttr, which can wind up calling __getattr__, which is free to inflict all sorts of side effects on the attribute lookup. PEP 267 doesn't appear to be similarly affected, assuming it can conclude that LOAD_GLOBAL is actually loading a module object. (Can it?) LOAD_GLOBAL alone shouldn't be a problem, since all that does is call PyDict_GetItem for globals and builtins.

The approach I'm working on would have to check that the object is a module on each use, but that's relatively cheap compared to the layers of function calls we have now. It's a pretty safe assumption because it would only be made for objects bound by an import statement. I also wanted to answer Samuele's question briefly, because I'm going to be busy with other things most of today. The basic idea, which I need to flesh out by next week, is that the internal binding for "mod.attr" that a module keeps is just a hint. The compiler notices that function f() uses "mod.attr" and that mod is imported at the module level. The "mod.attr" binding must include a pointer to the location where mod is stored and the pointer it found when the "mod.attr" binding was updated. When "mod.attr" is used, the interpreter must check that mod is still bound to the same object. If so, the "mod.attr" binding is still valid. Note that the "mod.attr" binding is a PyObject ** -- a pointer to the location where "attr" is bound in "mod". Jeremy

Samuele Pedroni

10:30 a.m.

First, thanks for the answer :). Here is my input on the topic [Obviously I won't be present the developer day] From: Jeremy Hylton <jeremy@alum.mit.edu>

...

The approach I'm working on would have to check that the object is a module on each use, but that's relatively cheap compared to the layers of function calls we have now. It's a pretty safe assumption because it would only be made for objects bound by an import statement.

I also wanted to answer Samuele's question briefly, because I'm going to be busy with other things most of today. The basic idea, which I need to flesh out by next week, is that the internal binding for "mod.attr" that a module keeps is just a hint. The compiler notices that function f() uses "mod.attr" and that mod is imported at the module level. The "mod.attr" binding must include a pointer to the location where mod is stored and the pointer it found when the "mod.attr" binding was updated. When "mod.attr" is used, the interpreter must check that mod is still bound to the same object. If so, the "mod.attr" binding is still valid. Note that the "mod.attr" binding is a PyObject ** -- a pointer to the location where "attr" is bound in "mod".

I see, btw I asked primarily because the PEP as it is is vague, not because I believed the idea cannot fly [for Jython the issue is more complicated, PyObject ** is not something easily expressed in Java <wink>] I think that it is worth to point out that what you propose is a special/ ad-hoc version of what typically other Smalltalk-like dynamic languages do, together with jitting, but the approach is orthogonal to that, namely: for every send site they have a send-site cache: if send-site-cache-still-applies: # (1) dispatch based on site-cache contents # (2) else: normal send lookup and update send-site-cache In Python more or less the same could be applied to load_* instead of sends. Your approach deals with a part of those. These need (only) module-level caches. The extended/general approach could work too and give some benefit. But it is clear that the complexity and overhead of (1) and (2), and the space-demand for the caches depend on how much homogeneous are system object layouts and behaviors. And Python with modules, data-objects, class/instances, types etc is quite a zoo :(. Pushing the class/type unification further, this is an aspect to consider IMHO. If those things where already all known sorry for the boring post. regards.

Jeremy Hylton

4:59 a.m.

...

...
...
...
...
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:

SP> But it is clear that the complexity and overhead of (1) and (2), SP> and the space-demand for the caches depend on how much SP> homogeneous are system object layouts and behaviors. Good point! It's important to try to extract the general principles at work and see how they can be applied systematically. The general notion I have is that dictionaries are not an efficient way to implement namespaces. The way most namespaces are used -- in particular that most names are known statically -- allows a more efficient implement. SP> And Python with modules, data-objects, class/instances, types SP> etc is quite a zoo :(. And, again, this is a problem. The same sorts of techniques apply to all namespaces. It would be good to try to make the approach general, but some namespaces are more dynamic than others. Python's classes, lack of declarations, and separate compilation of modules means class/instance namespaces are hard to do right. Need to defer a lot of final decisions to runtime and keep an extra dictionary around just in case. SP> Pushing the class/type unification further, this is an aspect to SP> consider IMHO. SP> If those things where already all known sorry for the boring SP> post. Thanks for good questions and suggestions. Too bad you can't come to dev day. I'll try to post slides before or after the talk -- and update the PEP. Jermey

Samuele Pedroni

4:58 p.m.

[Jeremy Hylton]

...

Thanks for good questions and suggestions. Too bad you can't come to dev day. I'll try to post slides before or after the talk -- and update the PEP.

Here are some more wild ideas, probably more thought provoking than useful, but this is really an area where only the profiler knows the truth <wink>.

...

SP> And Python with modules, data-objects, class/instances, types SP> etc is quite a zoo :(.

And, again, this is a problem. The same sorts of techniques apply to all namespaces. It would be good to try to make the approach general, but some namespaces are more dynamic than others. Python's classes, lack of declarations, and separate compilation of modules means class/instance namespaces are hard to do right. Need to defer a lot of final decisions to runtime and keep an extra dictionary around just in case.

* instance namespaces As I said but what eventually will happen with class/type unification plays a role. 1. __slots__ are obviously a good thing here :) 2. old-style instances and in general instances with a dict: one can try to guess the slots of a class looking for the "self.attr" pattern at compile time in a more or less clever way. The set of compile-time guessed attrs will be passed to MAKE_CLASS which will construct the runtime guess using the union of the super-classes guesses and the compile time guess for the class. This information can be used to layout a dlict. * zoo problem [yes as I said this whole inline cache thing is supossed to trade memory with speed. And the fact that python internal objects are so inhomogeneous/ polymorphic <wink> does not help to keep the amount small, for example having only new-style classes would help] ideally one can assign to each bytecode in a codeobject whose behavior depends/dispatchs on the concrete object "type" a "cache line" (or many, polymorphic inline caches for modern Smalltalk impl does that in the context of the jit) (As long as the GIL is there we do not need per-thread version of the caches) the first entries in the "cache-line" could contain the PyObject type and then a function pointer, so the we would have a common logic like: if PyObjectType(obj) == cache_line.type: cache_line.onType() else: ... then the per-type code could use the rest of the space in cache-line polymorphically to contain type-specific cached "dispatch" info. E.g. the index of a dict entry for the load_attr/set_attr logic on an instance ... Abstractly one can think about a cache-line for a bytecode as the streamlined version in terms of values/or code-pointers of the last time taken path for that bytecode, plus values to check whether the very same path still makes sense. 1. in practice these ideas can perform very poorly 2. this try to address things/internals as they are, 3. Yup, anything on the object layout/behavior side that simplifies this picture probably does a step in the right direction. regards, Samuele.

Jeremy Hylton

5:15 p.m.

...

...
...
...
...
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:

SP> * instance namespaces SP> As I said but what eventually will happen with class/type SP> unification plays a role. SP> 1. __slots__ are obviously a good thing here :) SP> 2. old-style instances and in general instances with a dict: SP> one can try to guess the slots of a class looking for the SP> "self.attr" pattern at compile time in a more or less clever SP> way. The set of compile-time guessed attrs will be passed to SP> MAKE_CLASS which will construct the runtime guess using the SP> union of the super-classes guesses and the compile time guess SP> for the class. This information can be used to layout a dlict. Right! There's another step necessary to take advantage though. When you execute a method you don't know the receiver type (self.__class__). So you need to specialize the bytecode to a particular receiver the first time the method is called. Since this could be relatively expensive and you don't know how often the method will be executed, you need to decide dynamically when to do it. Just like HotSpot. We probably have to worry about a class or instance being modified in a way that invalidates the dlict offsets computed. (Not sure here, but I think that's the case.) If so, we probably need a different object -- call it a template -- that represents the concrete layout and is tied to unmodified concrete class. When objects or classes are modified in dangerous ways, we'd need to invalidate the template pointer for the affected instances. Jeremy

Samuele Pedroni

6:25 p.m.

From: Jeremy Hylton <jeremy@zope.com>

...

...
...
...
...
...
"SP" == Samuele Pedroni <pedronis@bluewin.ch> writes: ... SP> one can try to guess the slots of a class looking for the SP> "self.attr" pattern at compile time in a more or less clever SP> way. The set of compile-time guessed attrs will be passed to SP> MAKE_CLASS which will construct the runtime guess using the SP> union of the super-classes guesses and the compile time guess SP> for the class. This information can be used to layout a dlict.

Right! There's another step necessary to take advantage though. When you execute a method you don't know the receiver type (self.__class__). So you need to specialize the bytecode to a particular receiver the first time the method is called. Since this could be relatively expensive and you don't know how often the method will be executed, you need to decide dynamically when to do it. Just like HotSpot.

Right, because with multiple inheritance you cannot make the layout of a subclass compatible with that of *all* superclasses, so simple monomorphic inline caches will not work :(. OTOH you can use polymorphic inline cachesm, that means a bunch of class->index lines for each bytecode or not specialize the bytecode but (insane idea) choose on method entry a different bunch of cache-lines based on self class.

...

We probably have to worry about a class or instance being modified in a way that invalidates the dlict offsets computed. (Not sure here, but I think that's the case.) If so, we probably need a different object -- call it a template -- that represents the concrete layout and is tied to unmodified concrete class. When objects or classes are modified in dangerous ways, we'd need to invalidate the template pointer for the affected instances.

This would be similar to the Self VM map concept (although python is type/class based because of the very dynamic nature of instances it has similar problems to prototype based languages). I don't know if we need that and if it can be implemented effectively, I considered that too during my brainstorming. AFAIK caching/memoization plays an important role in all high perf dynamic object languages impls. Abstractly it seems effective for Python too, but it is unclear if the complexity of the internal models will render it ineffective. With caching you can probably simply timestamp classes, when a class is changed structurally you increment its timestamp and that of all direct and inderect subclasses, you don't touch instances. Then you compare the cached timestamp with that of instance class to check if the entry is valid. The tricky part is that in python an instance attribute can be added at any point that shadows a class attribute. I don't know if there are open issues, but an approach would be in that case to increment the timestamp of the instance classe too. The problem is that there are so many cases and situations, that's why the multi-staged cache-lines approach in theory makes some sense but could be anyway totally ineffective in practice <wink>. These are all interesting topics, although from these more or less informal discussions to results there is a lot of details and code :(. But already improving the global lookup thing would be a good step. Hope this makes some kind of sense. Samuele.

Neal Norwitz

2 Feb 2 Feb

6:38 a.m.

Samuele Pedroni wrote:

...

From: Jeremy Hylton <jeremy@zope.com>

...
...
...
...
...
> "SP" == Samuele Pedroni <pedronis@bluewin.ch> writes: ... SP> one can try to guess the slots of a class looking for the SP> "self.attr" pattern at compile time in a more or less clever SP> way. The set of compile-time guessed attrs will be passed to SP> MAKE_CLASS which will construct the runtime guess using the SP> union of the super-classes guesses and the compile time guess SP> for the class. This information can be used to layout a dlict.

Right! There's another step necessary to take advantage though. When you execute a method you don't know the receiver type (self.__class__). So you need to specialize the bytecode to a particular receiver the first time the method is called. Since this could be relatively expensive and you don't know how often the method will be executed, you need to decide dynamically when to do it. Just like HotSpot.

Why not assume the general case is the most common, ie, that the object is an instance of this class or one of its subclasses? That way you could do the specialization at compile time. And for the (presumably) few times that this isn't true fallback to another technique, perhaps like HotSpot. Also, doesn't calling a base class method as: Base.method(self) # in particular __init__() vs. self.method() create problems if you specialize for a specific class? Or does specialization necessarily mean for a subclass and all its base clases?

...

Right, because with multiple inheritance you cannot make the layout of a subclass compatible with that of *all* superclasses, so simple monomorphic inline caches will not work :(.

ISTM that it would be best to handle single inheritance first. Multiple inheritance could perhaps be handled for the class with the most commonly referenced attribute (assuming 2+ classes don't define the same attr. And use a fallback technique for all other cases.

...

...
We probably have to worry about a class or instance being modified in a way that invalidates the dlict offsets computed. (Not sure here, but I think that's the case.) If so, we probably need a different

Right, if an attr is deleted, methods added/removed dynamically, etc.

...

...
object -- call it a template -- that represents the concrete layout and is tied to unmodified concrete class. When objects or classes are modified in dangerous ways, we'd need to invalidate the template pointer for the affected instances.

By using a template, doesn't that become a dict lookup again?

...

These are all interesting topics, although from these more or less informal discussions to results there is a lot of details and code :(.

I agree. Since we can't know what will be optimal, it seems safer to keep the existing functionality as a fallback case and try to improve things with small steps (eg, single inheritance, first).

...

But already improving the global lookup thing would be a good step.

Definitely. Neal

Samuele Pedroni

7:44 a.m.

From: Neal Norwitz <neal@metaslash.com>

...

Why not assume the general case is the most common, ie, that the object is an instance of this class or one of its subclasses? That way you could do the specialization at compile time. And for the (presumably) few times that this isn't true fallback to another technique, perhaps like HotSpot.

Also, doesn't calling a base class method as: Base.method(self) # in particular __init__() vs. self.method()

create problems if you specialize for a specific class? Or does specialization necessarily mean for a subclass and all its base clases?

Puzzled. In Python you could specialization at MAKE_CLASS time, which means rewriting all the direct and indirect superclasses methods and the class method under the assumption that self is of the built class. Doing so is probably too expensive. Typically specializing only when a given method is actually called makes more sense. Btw typical systems specialize and native-compile at the same time, if you substract the native-compile part your cost equation change a lot. Given that people can change self (although nobody does) that you need data/control flow analysis, that's too bad: def a(self): self = 3 return self+1 Also: def __add__(self,o): ... You cannot do anything special for o :(.

...

[Jeremy] > Right, because with multiple inheritance you cannot make the layout

...
of a subclass compatible with that of *all* superclasses, so simple monomorphic inline caches will not work :(.

ISTM that it would be best to handle single inheritance first. Multiple inheritance could perhaps be handled for the class with the most commonly referenced attribute (assuming 2+ classes don't define the same attr. And use a fallback technique for all other cases.

How do you decide which are the most commonly referenced attributes? <wink>

...

...
...
We probably have to worry about a class or instance being modified in a way that invalidates the dlict offsets computed. (Not sure here, but I think that's the case.) If so, we probably need a different

Right, if an attr is deleted, methods added/removed dynamically, etc.

It really depends on implementation details.

...

...
...
object -- call it a template -- that represents the concrete layout and is tied to unmodified concrete class. When objects or classes are modified in dangerous ways, we'd need to invalidate the template pointer for the affected instances.

By using a template, doesn't that become a dict lookup again?

Tthe good thing about templates as idea is that they could solve the zoo isssue. You're right about lookup but to see the utility you should bring per bytecode instr caches in the picture: if obj.template == cache_line.template: use cache_line.cached_lookup_result else: lookup and update cache_line [The Self VM used maps (read templates) in such a way] There is really a huge hack/implentation space to play with. These comments are mainly informal, if the interest remain after the conference I will be pleased to partecipate to more focused and into-the-details discussions. regards, Samuele.

M.-A. Lemburg

1 Feb 1 Feb

2:21 a.m.

Jeremy Hylton wrote:

...

...
...
...
...
...
"GM" == Gordon McMillan <gmcm@hypernet.com> writes: GM> Heck, Marc-Andre tricks import so that pkg.mod is really GM> pkg.attr where the attr turns into a mod when accessed. No GM> problem, since it's only import that cares what it is. By the GM> time it's used it's always global.attr.attr....

Not sure I understand what Marc-Andre is doing. (That's probably true in general <wink>.) A client of his code types "import foo.bar." foo is a module? a package? When the "bar" attribute is loaded (LOAD_ATTR) is turns into another module?

Take a look at e.g. mx.DateTime.__init__ and the included LazyModule module for more background. I don't really use that approach myself, but sometimes it can be handy to be able to reference modules in packages without requiring an import of them, e.g. import mx.DateTime date = mx.DateTime.Parser.DateTimeFromString('2002-02-01') -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Jeff Epler

31 Jan 31 Jan

11:48 a.m.

On Thu, Jan 31, 2002 at 12:37:16PM -0600, Skip Montanaro wrote:

...

Interesting results. I've been working on my {TRACK,UNTRACK}_GLOBAL opcode implementations. I have an optimizer filter that sets up tracking for all LOAD_GLOBAL,{LOAD_ATTR}* combinations. It's still not quite working and will only be a proof of concept by devday if I do get it working, but I expect most of these expensive opcode combinations to collapse into a LOAD_FAST, with the addition of a TRACK_GLOBAL/UNTRACK_GLOBAL pair executed at function start and end, respectively.

Won't there be code that this slows down? For instance, the code generated by print "f = lambda: 0" print "def g():" print "\tif f():" # prevent optimization of 'if 0:' print "\t\tx = []" for i in range(10000): print "\t\tx.append(global_%d)" % i print "\t\treturn x" print "\treturn []" (10001 TRACK_GLOBALs, one LOAD_GLOBAL) not to mention, will it even work? TRACK_GLOBAL will have to make special note of globals that didn't exist yet when the function prologue is executed, and either not subsequently execute the load as a LOAD_FAST or else have a special value that causes the same NameError "global name 'global_666' is not defined" message, not an UnboundLocalError... The latter sounds easy enough to solve, but how do you make sure that this optimization is never a pessimization (aside from sending programmers such as myself to the retraining camps of the PSU)? Jeff PS Hey, that's remarkable .. usually people get unexpectedly cut off when they try to mentio

Skip Montanaro

12:27 p.m.

Jeff> Won't there be code that this slows down? For instance, the code Jeff> generated by ... Sure, there will be code that slows down. That's why I said what I am working on is a proof of concept. Right now, each function the optimizer operates on converts it to the equivalent of TRACK_GLOBAL x.foo TRACK_GLOBAL y.bar TRACK_GLOBAL z try: original function body using x.foo, y.bar and z finally: UNTRACK_GLOBAL z UNTRACK_GLOBAL y.bar UNTRACK_GLOBAL x.foo There are no checks for obvious potential problems at the moment. Such problems include (but are not limited to): * Only track globals that are accessed in loops. This would eliminate your corner case and should be easily handled (only work between SETUP_LOOP and its jump target). * Only track globals when there are <= 256 globals (half an oparg - the other half being an index into the fastlocals array). This would also cure your problem. * Only track globals that are valid at the start of function execution, or defer tracking setup until they are. This can generally be avoided by not tracking globals that are written during the function's execution, but other safeguards will probably be necessary to insure that it works properly. Jeff> ... how do you make sure that this optimization is never a Jeff> pessimization ... I expect in the majority of cases either my idea or Jeremy's will be a net win, especially after seeing his timing data. I'm willing to accept that in some situations the code will run slower. I'm confident they will be a small minority. Tim Peters can construct cases where dicts perform badly. Does that mean Python shouldn't have dicts? ;-) Skip

Tim Peters

1:31 p.m.

[Skip Montanaro]

...

... Tim Peters can construct cases where dicts perform badly. Does that mean Python shouldn't have dicts? ;-)

I've thought about that hard over the years. The answer is no <wink>.

aahz＠rahul.net

2:07 p.m.

Jeremy Hylton wrote:

...

I've made some simple measurements of how long opcodes take to execute and how long it takes to go around the mainloop, using the Pentim timestamp counter, which measures processor cycles.

The results aren't particularly surprising, but they provide some empirical validation of what we've believed all along. I don't have time to go into all the gory details here, though I plan to at Spam 10 developers day next week.

My suggestion WRT SET_LINENO is to encourage the use of python -O and PYTHONOPTIMIZE. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista We must not let the evil of a few trample the freedoms of the many.

Jeremy Hylton

4:34 a.m.

...

...
...
...
...
"AM" == Aahz Maruch <aahz@rahul.net> writes:

AM> My suggestion WRT SET_LINENO is to encourage the use of python AM> -O and PYTHONOPTIMIZE. Vladimir submitted a patch long ago to dynamically recompile bytecode to add/remove SET_LINENO as needed. I find that approach much more appealing, because you don't have to pay the SET_LINENO penalty just because there's some chance you'd want to connect with a debugger. A long running server process is the prime use case; it benefits from -O but may need to be debugged. Jeremy

Tim Peters

2:30 p.m.

[Aahz]

...

My suggestion WRT SET_LINENO is to encourage the use of python -O and PYTHONOPTIMIZE.

What SET_LINENO does isn't even used in normal Python operation anymore (line numbers in tracebacks are obtained via a different means, the co_lnotab member of PyCodeObjects). They're needed now only to call back to user-supplied tracing routines, and that's rarely needed. The Python debugger is the most visible example of a tool that uses the line tracing hook. There are others way to get that to work, but they require real thought and effort to implement. There's a patch on SourceForge (IIRC, from Vladimir) that may have worked at one time, but nobody has picked it up (I tried to for 2.2, but couldn't make time for it then; I don't expect to have time for it for 2.3 either, alas).

8315

Age (days ago)

8317

Last active (days ago)

List overview

Download

30 comments

11 participants

participants (11)

aahz＠rahul.net
Gordon McMillan
Jeff Epler
jepler＠unpythonic.dhs.org
Jeremy Hylton
Jeremy Hylton
M.-A. Lemburg
Neal Norwitz
Samuele Pedroni
Skip Montanaro
Tim Peters

opcode performance measurements

Jeremy Hylton

Jeff Epler

jepler＠unpythonic.dhs.org

Samuele Pedroni

Samuele Pedroni

Gordon McMillan

Jeremy Hylton

Gordon McMillan

Samuele Pedroni

Jeremy Hylton

Samuele Pedroni

Jeremy Hylton

Samuele Pedroni

Neal Norwitz

Samuele Pedroni

M.-A. Lemburg

Jeff Epler

Tim Peters

aahz＠rahul.net

Tim Peters

tags

participants (11)