[Python-Dev] RE: [Python-checkins] python/dist/src/Python ceval.c,
2.383, 2.384
Raymond Hettinger
python at rcn.com
Sat Mar 20 16:14:58 EST 2004
> Modified Files:
> ceval.c
> Log Message:
> A 2% speed improvement with gcc on low-endian machines. My guess is
that
> this
> new pattern for NEXTARG() is detected and optimized as a single
(*short)
> loading.
It is possible to verify that guess by looking at the generated
assembler.
There are other possible reasons. One is that the negative array
offsets don't compile well into a native addressing mode of
base+offset*wordsize. I have seen and proven that is the case in other
parts of the code base. The other reason for the speedup is that
pre-incrementing the pointer prevented the lookup from being done in
parallel (i.e. a sequential dependency was present).
If the latter reason is a true cause, then part of the checkin is
counter-productive. The change to PREDICTED_WITH_ARG introduces a
pre-increment in addition to the post-increment. Please run another
timing with and without the change to PREDICTED_WITH_ARG. I suspect the
old way ran faster. Also, the old way will always be faster on
big-endian machines and would be faster on machines with less
sophisticated compilers (and possibly slower on MSVC++ if it doesn't
automatically generate a load short). Another consideration is that
loading a short may perform much differently on other architectures
because even alignment only occurs half of the time.
Summary: +1 on the changes to NEXT_ARG and EXTENDED_ARG;
-1 on the change to PREDICTED_WITH_ARG.
Raymond Hettinger
> #define PREDICTED(op) PRED_##op: next_instr++
> ! #define PREDICTED_WITH_ARG(op) PRED_##op: oparg =
(next_instr[2]<<8)
> + \
> ! next_instr[1]; next_instr += 3
>
> /* Stack manipulation macros */
> --- 660,664 ----
>
> #define PREDICTED(op) PRED_##op: next_instr++
> ! #define PREDICTED_WITH_ARG(op) PRED_##op: next_instr++; oparg =
> OPARG(); next_instr += OPARG_SIZE
More information about the Python-Dev
mailing list