[New-bugs-announce] [issue4753] Faster opcode dispatch on gcc

Antoine Pitrou report at bugs.python.org
Fri Dec 26 22:09:38 CET 2008


New submission from Antoine Pitrou <pitrou at free.fr>:

This patch implements what is usually called "threaded code" for the
ceval loop on compilers which support it (only gcc). The idea is that
there is a separate opcode dispatch epilog at the end of each opcode,
which allows the CPU to make much better use of its branch prediction
capabilities. The net result is a 15-20% average speedup on pybench and
pystone, with higher speedups on very tight loops (see below for the
full pybench result chart).

The opcode jump table is generated by a separate script which is called
as part of the Makefile (just as the asdl building script already is).

On compilers other than gcc, performance will of course be unchanged.


Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other 
 diff
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:   100ms   107ms   -7.1%   101ms   110ms
  -8.2%
           BuiltinMethodLookup:    76ms   106ms  -28.1%    78ms   106ms
 -26.5%
                 CompareFloats:   108ms   141ms  -23.2%   108ms   141ms
 -23.2%
         CompareFloatsIntegers:   171ms   188ms   -9.4%   173ms   204ms
 -15.3%
               CompareIntegers:   165ms   213ms  -22.6%   168ms   224ms
 -25.1%
        CompareInternedStrings:   127ms   169ms  -24.6%   127ms   169ms
 -24.8%
                  CompareLongs:    95ms   124ms  -23.1%    95ms   126ms
 -24.5%
                CompareStrings:   109ms   136ms  -20.2%   111ms   139ms
 -19.9%
    ComplexPythonFunctionCalls:   131ms   150ms  -12.4%   136ms   151ms
 -10.2%
                 ConcatStrings:   159ms   171ms   -6.9%   160ms   173ms
  -7.4%
               CreateInstances:   148ms   157ms   -5.6%   150ms   158ms
  -4.9%
            CreateNewInstances:   112ms   117ms   -4.3%   112ms   118ms
  -4.6%
       CreateStringsWithConcat:   144ms   198ms  -27.3%   148ms   199ms
 -25.7%
                  DictCreation:    90ms   104ms  -13.3%    90ms   104ms
 -13.1%
             DictWithFloatKeys:   117ms   153ms  -23.7%   117ms   154ms
 -24.0%
           DictWithIntegerKeys:   104ms   153ms  -32.3%   104ms   154ms
 -32.5%
            DictWithStringKeys:    90ms   140ms  -35.7%    90ms   141ms
 -36.3%
                      ForLoops:   100ms   161ms  -38.1%   100ms   161ms
 -38.1%
                    IfThenElse:   123ms   170ms  -28.0%   125ms   171ms
 -27.1%
                   ListSlicing:   142ms   141ms   +0.3%   142ms   142ms
  +0.2%
                NestedForLoops:   135ms   190ms  -29.0%   135ms   190ms
 -29.0%
          NormalClassAttribute:   249ms   281ms  -11.5%   249ms   281ms
 -11.3%
       NormalInstanceAttribute:   110ms   153ms  -28.2%   111ms   154ms
 -28.1%
           PythonFunctionCalls:   106ms   130ms  -18.7%   108ms   131ms
 -17.2%
             PythonMethodCalls:   151ms   169ms  -10.1%   152ms   169ms
  -9.8%
                     Recursion:   183ms   242ms  -24.7%   191ms   243ms
 -21.4%
                  SecondImport:   142ms   138ms   +2.7%   144ms   139ms
  +3.4%
           SecondPackageImport:   146ms   149ms   -2.3%   148ms   150ms
  -1.5%
         SecondSubmoduleImport:   201ms   193ms   +3.9%   201ms   195ms
  +3.4%
       SimpleComplexArithmetic:    90ms   112ms  -19.6%    90ms   112ms
 -19.8%
        SimpleDictManipulation:   172ms   230ms  -25.2%   173ms   231ms
 -25.0%
         SimpleFloatArithmetic:    98ms   133ms  -26.3%    99ms   137ms
 -27.9%
      SimpleIntFloatArithmetic:   134ms   175ms  -23.6%   138ms   176ms
 -21.6%
       SimpleIntegerArithmetic:   134ms   183ms  -26.8%   141ms   183ms
 -23.1%
        SimpleListManipulation:    91ms   143ms  -36.3%    93ms   143ms
 -35.1%
          SimpleLongArithmetic:    88ms   108ms  -17.9%    91ms   109ms
 -16.2%
                    SmallLists:   127ms   162ms  -21.6%   129ms   164ms
 -21.2%
                   SmallTuples:   149ms   177ms  -15.6%   151ms   178ms
 -15.1%
         SpecialClassAttribute:   423ms   426ms   -0.7%   426ms   430ms
  -0.9%
      SpecialInstanceAttribute:   110ms   154ms  -28.2%   111ms   154ms
 -28.3%
                StringMappings:   428ms   443ms   -3.4%   432ms   449ms
  -3.7%
              StringPredicates:   124ms   161ms  -23.1%   125ms   162ms
 -22.7%
                 StringSlicing:   207ms   223ms   -7.1%   208ms   228ms
  -8.7%
                     TryExcept:    72ms   166ms  -56.3%    73ms   166ms
 -56.2%
                    TryFinally:    93ms   120ms  -22.9%    93ms   124ms
 -25.2%
                TryRaiseExcept:    52ms    64ms  -19.2%    52ms    65ms
 -19.2%
                  TupleSlicing:   177ms   195ms   -9.1%   178ms   198ms
 -10.2%
                   WithFinally:   147ms   163ms  -10.2%   147ms   164ms
 -10.1%
               WithRaiseExcept:   156ms   173ms  -10.1%   157ms   174ms
  -9.7%
-------------------------------------------------------------------------------
Totals:                          6903ms  8356ms  -17.4%  6982ms  8443ms
 -17.3%

----------
components: Interpreter Core
files: threadedceval1.patch
keywords: patch
messages: 78306
nosy: pitrou
priority: normal
severity: normal
stage: patch review
status: open
title: Faster opcode dispatch on gcc
type: performance
versions: Python 3.1
Added file: http://bugs.python.org/file12457/threadedceval1.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4753>
_______________________________________


More information about the New-bugs-announce mailing list