[Patches] [Patch #100893] new EXTENDED_ARG opcode, elimiate 16-bit oparg limit

noreply@sourceforge.net noreply@sourceforge.net
Thu, 27 Jul 2000 07:18:37 -0700


Patch #100893 has been updated. 

Project: 
Category: core (C code)
Status: Rejected
Summary: new EXTENDED_ARG opcode, elimiate 16-bit oparg limit

Follow-Ups:

Date: 2000-Jul-15 05:24
By: twouters

Comment:
Can you elaborate on the point of this patch ? It removes the current 32k expression limit, but the cost is pretty high. Is the 32k limit really a problem ? Would a 64k limit, by fixing the current signed-byte problems, suffice ? Also, the patch seems to rely on the fact that ints are always 32bit. That is bound to be an unsafe assumption in the long run.


-------------------------------------------------------

Date: 2000-Jul-15 18:32
By: cgw

Comment:
I don't think it's really true that the cost of
this patch is high.  I did timing tests, on 3
different machines; my (admittedly somewhat
unscientific) test was to run "python pystone.py"
100x and collect statistics on the results, for both
the original and patched Python interpreters.  Values
cited are PyStones.
 
Here are the results: 
============================================= 
On an SGI IRIX64 6.5 IP27 
  built with native compiler MIPSpro Compilers: Version 7.2.1  
 Unpatched:  N=100 mean=2638.05 std. dev.=42.8 
 Patched:    N=100 mean=2656.19 std. dev.=14.8 
 Difference: +0.7% 
 
 built with gcc version 2.95.2 19991024 (release) 
 Unpatched:  N=100 mean=2171.77 std. dev.=8.69 
 Patched:    N=100 mean=2192.73 std. dev.=9.80 
 Difference: +1% 
============================================= 
On a SunOS  5.6  sun4u sparc Ultra-Enterprise 
 built with native compiler WorkShop Compilers 4.2  
 Unpatched:  N=100 mean=1962.32 std dev=29.79 
 Patched:    N=100 mean=1913.84 std dev=8.705 
 Difference: -2.5% 
 built with gcc version 2.95.2 19991024 (release)                        
 Unpatched:  N=100 mean=1859.08 std dev=11.68 
 Patched:    N=100 mean=1939.78 std dev=11.97 
 Difference: +4.3% 
============================================= 
On Linux 2.2.16 SMP  
 built with gcc version 2.95.2 19991024 (release) 
 Unpatched:  N=100 mean=4498.78 std dev=102.61 
 Patched:    N=100 mean=4592.40 std dev=102.38 
 Difference: +2%  
 
I think that looking ahead in the instruction stream
is not costly because the bytecode at instruction n+2
is probably already in the CPU's level 1 or level 2 
cache; if not, "prefetching" this instruction does not
have any adverse affects on performace, because then
this instruction will be available when it is needed
(which will usually be almost immediately).  In many
cases, actually, the code with my patch runs a bit
faster.  I've also reduced the amount of arithmetic required
in the case of little-endian machines - rather than
bit-shifting and adding to get a short out of two bytes,
I simply use a pointer-cast.  Anyhow, try the patch out,
I think the speed difference is negligible.

To answer your other points - yes, I think the 32K limit is
a problem - I've seen cases where machine-generated files are longer than this.  And I'm not too happy with the idea
of replacing a 32K limit with a 64K limit.  

Finally, I don't see where I'm assuming that ints are
always 32 bit - I assume that a long is at least 32
bits, but unless I'm missing something this code will work
just fine on 64-bit machines.

Feedback is of course always welcome....


-------------------------------------------------------

Date: 2000-Jul-16 02:16
By: twouters

Comment:
I wasn't talking as much about speed cost as about complexity cost, but it's good to see that speed cost is negligible. I'm a bit worried that this extra logic in NEXTARG() is a bit too complex, but then I've never come close to the 32k limit, and I don't see a better solution without rewriting the entire instruction stream thing. Isn't it easier to adjust the code-generator to add more groupings ? I'm not sure if that'll work in all cases, though, as I haven't such a code-generator ;)

The reason the patch is `dangerous' is that you rely on ints small enough to represent in 4 bytes: you removed the overflow check, which means that on 64bit machines, if you have more than 2**31 child nodes, you'll generate faulty bytecode.

I'd say Tim or Guido need to take a look at this, and I guess they're too busy packing (or, in  Tim's case, unpacking stuff he doesn't want to take ;) for ORA's OSCON.


-------------------------------------------------------

Date: 2000-Jul-16 16:16
By: cgw

Comment:
It looks like I took out all the overflow checking, but in
fact the code is safe.  com_addbyte in compile.c checks
that its (int) argument is in the range 0<=x<=255.

The checks against SHRT_MAX that my patch removes were only added recently, and are unneccessary if EXTENDED_ARG is 
available.


-------------------------------------------------------

Date: 2000-Jul-27 07:18
By: gvanrossum

Comment:
This will be too slow -- it adds a lot of complexity to the decoding of *each* instruction, and that's the most time-sensitive code in all of ceval.c. I would consider a patch that introduces a new opcode that specifies the high two bytes of oparg as a prefix, repeats the opcode decoding inline,  and then jumps to the top of the switch -- this should only cost when 32-bit opargs are needed (plus a little bit due to code rearrangements, but that's unavoidable).  Pseudo-code:

label:
  switch  (opcode) {
  case LONG_ARG
    realopcode = NEXTOP();
    oparg = oparg<<16 | NEXTARG();
    goto label;
  ...other cases as before...
  }

Thus, for a REAL_OP with a 32-bit oparg, the code generator should generate code like:

  LONG_ARG <high 16 bits of oparg>
  <REAL_OP> <low 16 bits of oparg>
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=100893&group_id=5470