[pypy-dev] Interpreter level array implementation

Fri Jul 2 22:12:19 CEST 2010

On Fri, Jul 2, 2010 at 2:59 PM, Hakan Ardo <hakan at debian.org> wrote:
> Hi,
> we got the simplest possible interpreter level implementation of an
> array-like object running (in the interplevel-array branch) and it
> executes my previous example about 2 times slower than optimized C.
> Attached is the trace generated by the following example:
>
>    img=array(640*480);   l=0;   i=0;
>    while i<640*480:
>        l+=img[i]
>        i+=1
>
> a simplified version of that trace is:
>
>   1. [p0, p1, p2, p3, i4, p5, p6, p7, p8, p9, p10, f11, i]
>   2. i14 = int_lt(i, 307200)
>   3.   guard_true(i14, descr=<Guard1>)
>   4.   guard_nonnull_class(p10, 145745952, descr=<Guard2>)
>   5. img = getfield_gc(p10, descr=<GcPtrFieldDescr 8>)
>   6. f17 = getarrayitem_gc(img, i, descr=<FloatArrayDescr>)
>   7. f18 = float_add(f11, f17)
>   8. i20 = int_add_ovf(i, 1)
>   9.   guard_no_overflow(, descr=<Guard3>) #
>  10. i23 = getfield_raw(149604768, descr=<SignedFieldDescr 0>)
>  11. i25 = int_add(i23, 1)
>  12. setfield_raw(149604768, i25, descr=<SignedFieldDescr 0>)
>  13. i28 = int_and(i25, -2131755008)
>  14. i29 = int_is_true(i28)
>  15.   guard_false(i29, descr=<Guard4>)
>  16. jump(p0, p1, p2, p3, 27, ConstPtr(ptr31), ConstPtr(ptr32),
>           ConstPtr(ptr33), p8, p9, p10, f18, i20)
>
> Does these operation more or less correspond to assembler
> instructions? I guess that the extra overhead here as compared to the
> the C version would be line 4, 5, 9 and 10-15. What's 10-15 all about?
> I guess that most of these additional operation would not affect the
> performance of more complicated loops as they will only occur once per
> loop (although combining the guard on line 9 with line 3 might be a
> possible optimization)? Line 4 will appear once for each array used in
> the loop and line 5 once for every array access, right?
>
> Can the array implementation be designed in someway that would not
> generate line 5 above? Or would it be possible to get rid of it by
> some optimization?
>
> --
> Håkan Ardö
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>

In addition to the things you noted, I guess the int overflow check
can be optimized out, since i+=1 can never cause it to overflow given
that i is bounded at 640*480.  I suppose in general that would require
more dataflow analysis.

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me