[Python-Dev] RE: [Python-checkins] python/dist/src/Python ceval.c, 2.383, 2.384

Sat Mar 20 16:14:58 EST 2004

> Modified Files:
> 	ceval.c
> Log Message:
> A 2% speed improvement with gcc on low-endian machines.  My guess is
that
> this
> new pattern for NEXTARG() is detected and optimized as a single
(*short)
> loading.

It is possible to verify that guess by looking at the generated
assembler.

There are other possible reasons.  One is that the negative array
offsets don't compile well into a native addressing mode of
base+offset*wordsize.  I have seen and proven that is the case in other
parts of the code base.  The other reason for the speedup is that
pre-incrementing the pointer prevented the lookup from being done in
parallel (i.e. a sequential dependency was present).

If the latter reason is a true cause, then part of the checkin is
counter-productive.  The change to PREDICTED_WITH_ARG introduces a
pre-increment in addition to the post-increment.  Please run another
timing with and without the change to PREDICTED_WITH_ARG.  I suspect the
old way ran faster.  Also, the old way will always be faster on
big-endian machines and would be faster on machines with less
sophisticated compilers (and possibly slower on MSVC++ if it doesn't
automatically generate a load short).  Another consideration is that
loading a short may perform much differently on other architectures
because even alignment only occurs half of the time.

Summary:  +1 on the changes to NEXT_ARG and EXTENDED_ARG; 
          -1 on the change to PREDICTED_WITH_ARG.

Raymond Hettinger

>   #define PREDICTED(op)		PRED_##op: next_instr++
> ! #define PREDICTED_WITH_ARG(op)	PRED_##op: oparg =
(next_instr[2]<<8)
> + \
> ! 				next_instr[1]; next_instr += 3
> 
>   /* Stack manipulation macros */
> --- 660,664 ----
> 
>   #define PREDICTED(op)		PRED_##op: next_instr++
> ! #define PREDICTED_WITH_ARG(op)	PRED_##op: next_instr++; oparg =
> OPARG(); next_instr += OPARG_SIZE