[pypy-dev] CLI code generation

Mon Mar 20 21:32:21 CET 2006

Hi Armin,

Armin Rigo wrote:

 > I wonder how important this is at the moment.  Maybe the .NET JIT
 > compiler is good enough to remove all this.  How does the resulting
 > machine code look like?

I have not tried the CLR by Microsoft yet but in this stage I'm using 
mono under linux, just because I'd like to stay in windows as less as 
possibile ;-).

BTW, mono doesn't seems smart enough to optimize the code; consider the 
following IL methods: the first is generated by my compiler, the second 
is written by hand:

.method static public int32 slow(int32 a_1, int32 b_1) il managed
{
     .locals (int32 v6, int32 v12)

block0:
     ldarg.s 'a_1'
     ldarg.s 'b_1'
     add
     stloc.s 'v12'
     ldloc.s 'v12'
     stloc.s 'v6'
     br.s block1

block1:
     ldloc.s 'v6'
     ret
}

.method static public int32 fast(int32 a_1, int32 b_1) il managed
{
     ldarg.s 'a_1'
     ldarg.s 'b_1'
     add
     ret
}

I used mono's ahead-of-time compiler with all optimizations enabled, 
then I disassembled the result with "objdump -d"; here is an extract of 
the output:

Disassembly of section .text:

000004f0 <methods>:
  4f0:   55                      push   %ebp
  4f1:   8b ec                   mov    %esp,%ebp
  4f3:   8b 45 08                mov    0x8(%ebp),%eax
  4f6:   03 45 0c                add    0xc(%ebp),%eax
  4f9:   c9                      leave
  4fa:   c3                      ret
  4fb:   90                      nop
  4fc:   8d 74 26 00             lea    0x0(%esi,1),%esi
  500:   55                      push   %ebp
  501:   8b ec                   mov    %esp,%ebp
  503:   57                      push   %edi
  504:   8b 45 08                mov    0x8(%ebp),%eax
  507:   8b f8                   mov    %eax,%edi
  509:   03 7d 0c                add    0xc(%ebp),%edi
  50c:   8b c7                   mov    %edi,%eax
  50e:   8d 65 fc                lea    0xfffffffc(%ebp),%esp
  511:   5f                      pop    %edi
  512:   c9                      leave
  513:   c3                      ret
  514:   8d 74 26 00             lea    0x0(%esi,1),%esi

I don't know x86 assembly very well (to be honest I don't know it at all 
;-) but it seems that the 'fast' method spans from 4f0 to 4fc and the 
'slow' methods spans from 500 to 514, and I think that the first should 
be more efficient than the latter, don't I?

I don't know how smart are the JIT and AOT shipped with MS CLR, but 
perhaps it is worth the pain of trying to generate smarter code, so that 
it can run efficiently even under mono.

Sure, it is not the task with the highest priority.

 > It would probably make sense to write this as a function that takes a
 > single block and produces a list of "complex expression" objects -- to
 > be defined in a custom way, instead of trying to push this into the
 > existing flow graph model.

I agree, I think this is the simplest solution.

ciao Anto