[pypy-dev] Vectorizing numpy traces

Wed Feb 25 10:28:06 CET 2015

Thx for your response.

Yes I would be interested in a rewritten loop.

After browsing the jittracer longer I found the following.

```
label(p0, f16, p2, p3, p55, i54, i59, p20, i18, p6, i24, p36, i34, i40,
f32, i15, i23, i31, i39, i48, i58, i60, descr=TargetToken(41748336))

(numpy_call2: no get_printable_location)
f62 = raw_load(i15, i24, descr=<ArrayF 8>)
guard_not_invalidated(descr=<Guard0x246fe60>)
i63 = i24 + i23
f64 = raw_load(i31, i40, descr=<ArrayF 8>)
i65 = i40 + i39
f66 = f62 + f64
raw_store(i48, i59, f66, descr=<ArrayF 8>)
i67 = i54 + 1
i68 = i59 + i58
i69 = i67 >= i60
guard(i69 is false) show bridge  (run 2799 times, ~0%)
(numpy_call2: no get_printable_location)

jump(p0, f62, p2, p3, p55, i67, i68, p20, i18, p6, i63, p36, i34, i65,
f64, i15, i23, i31, i39, i48, i58, i60, descr=TargetToken(41748336))
```

Is this the trace that calculates a = b + c internally?
This looks much more like what I have had in mind.

Here some questions that I have:

1) That does the prefix p/i mean in the label arguments? I guess it is a
box of i -> integer, p -> pointer, but why does raw_load(i31, ...)
operate on an i31 instead of p31?

2) Why does every raw_load/raw_store has its own index calculation (e.g.
i63 = i40 + i39) instead of just using one common index?

3) Are the numpy arrays or python arrays aligned in memory? If not would
it be hard to change their alignment when they are allocated?

Best,
Richard

On 02/24/2015 11:17 PM, Maciej Fijalkowski wrote:
> Hi Richard.
> 
> 
> I will respond inline
> 
> On Tue, Feb 24, 2015 at 8:18 PM, Richard Plangger <rich at pasra.at> wrote:
>> hi,
>>
...
>>
>> (1) Is there a better way to get loops hot?
> 
> no, not really (you can lower the threshold though, see pypy --jit
> help for details, only global)
> 
>>
>> (2) I cannot really figure out what all those trace/loop parameters are.
>> obviously i can guess some but most of them do not really make sense to
>> me. Am I missing some settings to be able to display more information
>> for the trace?
>>     In addition to that I do not really see any chance to generate a
>> simd loop for (2). Every array access is (arguably) guarded by an array
>> index overflow and I think to skip that check would be invalid.
> 
> Not really, no, you can't get a better info, but I can probably explain
> 
> Some of the values can be easily promoted to constants (array sizes
> for multiplication for example). What you can do for array bound
> checks is to unroll this loop e.g. 4 times and do ONE array bound
> check instead of 4 (checking x + 4). Actually it's possible to have an
> optimization that removes all counters and has ONE number to track the
> iteration (it's not that hard if the arrays are similar, you can
> precompute if they match).
> 
>>
>> (3) There is no trace generated for this instruction? Does this
>> internally call a c function?
> 
> No, the trace follows the loop and does not compile the + (but the +
> has it's own loop anyway)
> 
>>
>> (4) What in our opinion makes sense to vectorize with simd instructions?
>> Could you provide some sample loops/code (ranging from simple to complex
>> loops)?
> 
> I can rewrite this loop in a vectorizable way if you want
> 
>>
>> I would really appreciate any help and/or push into the right direction.
>>
>> Best,
>> Richard
>>
>>
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev at python.org
>> https://mail.python.org/mailman/listinfo/pypy-dev
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20150225/14c285af/attachment.sig>