Re: [pypy-dev] Vectorizing numpy traces

25 Feb 2015


      On Wed, Feb 25, 2015 at 11:28 AM, Richard Plangger <rich@pasra.at> wrote:
...
Thx for your response.
Yes I would be interested in a rewritten loop.
After browsing the jittracer longer I found the following.
```
label(p0, f16, p2, p3, p55, i54, i59, p20, i18, p6, i24, p36, i34, i40,
f32, i15, i23, i31, i39, i48, i58, i60, descr=TargetToken(41748336))
(numpy_call2: no get_printable_location)
f62 = raw_load(i15, i24, descr=<ArrayF 8>)
guard_not_invalidated(descr=<Guard0x246fe60>)
i63 = i24 + i23
f64 = raw_load(i31, i40, descr=<ArrayF 8>)
i65 = i40 + i39
f66 = f62 + f64
raw_store(i48, i59, f66, descr=<ArrayF 8>)
i67 = i54 + 1
i68 = i59 + i58
i69 = i67 >= i60
guard(i69 is false) show bridge  (run 2799 times, ~0%)
(numpy_call2: no get_printable_location)
jump(p0, f62, p2, p3, p55, i67, i68, p20, i18, p6, i63, p36, i34, i65,
f64, i15, i23, i31, i39, i48, i58, i60, descr=TargetToken(41748336))
```
Is this the trace that calculates a = b + c internally?
yes
...
This looks much more like what I have had in mind.
Here some questions that I have:
1) That does the prefix p/i mean in the label arguments? I guess it is a
box of i -> integer, p -> pointer, but why does raw_load(i31, ...)
operate on an i31 instead of p31?
p is a gc pointer, i is an integer or raw pointer (we should invent
new name for it, it's biting us in the JIT)
...
2) Why does every raw_load/raw_store has its own index calculation (e.g.
i63 = i40 + i39) instead of just using one common index?
because we did not optimize it correctly ;-) it's about specializing
the loop and detecting you can reuse the same indexing
...
3) Are the numpy arrays or python arrays aligned in memory? If not would
it be hard to change their alignment when they are allocated?
we control the alignment (I think they're 8 byte aligned now, but can
be made 16). Remember that numpy arrays don't have to be contiguous so
you have to be a bit careful
...
Best,
Richard
On 02/24/2015 11:17 PM, Maciej Fijalkowski wrote:
...
Hi Richard.
I will respond inline
On Tue, Feb 24, 2015 at 8:18 PM, Richard Plangger <rich@pasra.at> wrote:
...
hi,
...
...
(1) Is there a better way to get loops hot?
no, not really (you can lower the threshold though, see pypy --jit
help for details, only global)
...
(2) I cannot really figure out what all those trace/loop parameters are.
obviously i can guess some but most of them do not really make sense to
me. Am I missing some settings to be able to display more information
for the trace?
    In addition to that I do not really see any chance to generate a
simd loop for (2). Every array access is (arguably) guarded by an array
index overflow and I think to skip that check would be invalid.
Not really, no, you can't get a better info, but I can probably explain
Some of the values can be easily promoted to constants (array sizes
for multiplication for example). What you can do for array bound
checks is to unroll this loop e.g. 4 times and do ONE array bound
check instead of 4 (checking x + 4). Actually it's possible to have an
optimization that removes all counters and has ONE number to track the
iteration (it's not that hard if the arrays are similar, you can
precompute if they match).
...
(3) There is no trace generated for this instruction? Does this
internally call a c function?
No, the trace follows the loop and does not compile the + (but the +
has it's own loop anyway)
...
(4) What in our opinion makes sense to vectorize with simd instructions?
Could you provide some sample loops/code (ranging from simple to complex
loops)?
I can rewrite this loop in a vectorizable way if you want
...
I would really appreciate any help and/or push into the right direction.
Best,
Richard
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev