[pypy-dev] Interpreter level array implementation
Alex Gaynor
alex.gaynor at gmail.com
Fri Jul 2 22:12:19 CEST 2010
On Fri, Jul 2, 2010 at 2:59 PM, Hakan Ardo <hakan at debian.org> wrote:
> Hi,
> we got the simplest possible interpreter level implementation of an
> array-like object running (in the interplevel-array branch) and it
> executes my previous example about 2 times slower than optimized C.
> Attached is the trace generated by the following example:
>
> img=array(640*480); l=0; i=0;
> while i<640*480:
> l+=img[i]
> i+=1
>
> a simplified version of that trace is:
>
> 1. [p0, p1, p2, p3, i4, p5, p6, p7, p8, p9, p10, f11, i]
> 2. i14 = int_lt(i, 307200)
> 3. guard_true(i14, descr=<Guard1>)
> 4. guard_nonnull_class(p10, 145745952, descr=<Guard2>)
> 5. img = getfield_gc(p10, descr=<GcPtrFieldDescr 8>)
> 6. f17 = getarrayitem_gc(img, i, descr=<FloatArrayDescr>)
> 7. f18 = float_add(f11, f17)
> 8. i20 = int_add_ovf(i, 1)
> 9. guard_no_overflow(, descr=<Guard3>) #
> 10. i23 = getfield_raw(149604768, descr=<SignedFieldDescr 0>)
> 11. i25 = int_add(i23, 1)
> 12. setfield_raw(149604768, i25, descr=<SignedFieldDescr 0>)
> 13. i28 = int_and(i25, -2131755008)
> 14. i29 = int_is_true(i28)
> 15. guard_false(i29, descr=<Guard4>)
> 16. jump(p0, p1, p2, p3, 27, ConstPtr(ptr31), ConstPtr(ptr32),
> ConstPtr(ptr33), p8, p9, p10, f18, i20)
>
> Does these operation more or less correspond to assembler
> instructions? I guess that the extra overhead here as compared to the
> the C version would be line 4, 5, 9 and 10-15. What's 10-15 all about?
> I guess that most of these additional operation would not affect the
> performance of more complicated loops as they will only occur once per
> loop (although combining the guard on line 9 with line 3 might be a
> possible optimization)? Line 4 will appear once for each array used in
> the loop and line 5 once for every array access, right?
>
> Can the array implementation be designed in someway that would not
> generate line 5 above? Or would it be possible to get rid of it by
> some optimization?
>
> --
> Håkan Ardö
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>
In addition to the things you noted, I guess the int overflow check
can be optimized out, since i+=1 can never cause it to overflow given
that i is bounded at 640*480. I suppose in general that would require
more dataflow analysis.
Alex
--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me
More information about the Pypy-dev
mailing list