On Wed, Feb 25, 2015 at 11:28 AM, Richard Plangger <rich@pasra.at> wrote:
Thx for your response.
Yes I would be interested in a rewritten loop.
After browsing the jittracer longer I found the following.
``` label(p0, f16, p2, p3, p55, i54, i59, p20, i18, p6, i24, p36, i34, i40, f32, i15, i23, i31, i39, i48, i58, i60, descr=TargetToken(41748336))
(numpy_call2: no get_printable_location) f62 = raw_load(i15, i24, descr=<ArrayF 8>) guard_not_invalidated(descr=<Guard0x246fe60>) i63 = i24 + i23 f64 = raw_load(i31, i40, descr=<ArrayF 8>) i65 = i40 + i39 f66 = f62 + f64 raw_store(i48, i59, f66, descr=<ArrayF 8>) i67 = i54 + 1 i68 = i59 + i58 i69 = i67 >= i60 guard(i69 is false) show bridge (run 2799 times, ~0%) (numpy_call2: no get_printable_location)
jump(p0, f62, p2, p3, p55, i67, i68, p20, i18, p6, i63, p36, i34, i65, f64, i15, i23, i31, i39, i48, i58, i60, descr=TargetToken(41748336)) ```
Is this the trace that calculates a = b + c internally?
yes
This looks much more like what I have had in mind.
Here some questions that I have:
1) That does the prefix p/i mean in the label arguments? I guess it is a box of i -> integer, p -> pointer, but why does raw_load(i31, ...) operate on an i31 instead of p31?
p is a gc pointer, i is an integer or raw pointer (we should invent new name for it, it's biting us in the JIT)
2) Why does every raw_load/raw_store has its own index calculation (e.g. i63 = i40 + i39) instead of just using one common index?
because we did not optimize it correctly ;-) it's about specializing the loop and detecting you can reuse the same indexing
3) Are the numpy arrays or python arrays aligned in memory? If not would it be hard to change their alignment when they are allocated?
we control the alignment (I think they're 8 byte aligned now, but can be made 16). Remember that numpy arrays don't have to be contiguous so you have to be a bit careful
Best, Richard
On 02/24/2015 11:17 PM, Maciej Fijalkowski wrote:
Hi Richard.
I will respond inline
On Tue, Feb 24, 2015 at 8:18 PM, Richard Plangger <rich@pasra.at> wrote:
hi,
...
(1) Is there a better way to get loops hot?
no, not really (you can lower the threshold though, see pypy --jit help for details, only global)
(2) I cannot really figure out what all those trace/loop parameters are. obviously i can guess some but most of them do not really make sense to me. Am I missing some settings to be able to display more information for the trace? In addition to that I do not really see any chance to generate a simd loop for (2). Every array access is (arguably) guarded by an array index overflow and I think to skip that check would be invalid.
Not really, no, you can't get a better info, but I can probably explain
Some of the values can be easily promoted to constants (array sizes for multiplication for example). What you can do for array bound checks is to unroll this loop e.g. 4 times and do ONE array bound check instead of 4 (checking x + 4). Actually it's possible to have an optimization that removes all counters and has ONE number to track the iteration (it's not that hard if the arrays are similar, you can precompute if they match).
(3) There is no trace generated for this instruction? Does this internally call a c function?
No, the trace follows the loop and does not compile the + (but the + has it's own loop anyway)
(4) What in our opinion makes sense to vectorize with simd instructions? Could you provide some sample loops/code (ranging from simple to complex loops)?
I can rewrite this loop in a vectorizable way if you want
I would really appreciate any help and/or push into the right direction.
Best, Richard
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev