[pypy-dev] Vectorizing numpy traces

Maciej Fijalkowski fijall at gmail.com
Tue Feb 24 23:17:33 CET 2015


Hi Richard.


I will respond inline

On Tue, Feb 24, 2015 at 8:18 PM, Richard Plangger <rich at pasra.at> wrote:
> hi,
>
> i'm currently trying to find a way to generate simd code for python
> loops that operate on numpy arrays.
> The IRC is quite busy so I think I'd rather post it as an email...
>
> I have familiarized myself with pypy and read most of the docs and read
> the code in the metainterp to understand how the traces are built.
>
> I translated pypy with --withmod-micronumpy enabled (installed numpy
> pypy fork in an virtual env) and gave the jitviewer a try.
> Here is a sample program that does not really make sense, but I think it
> would contain opportunity to generate SIMD instructions.
>
> ```
> import numpy
> def invoke():
>     a = numpy.zeros(1024)
>     b = numpy.zeros(1024)
>     c = numpy.zeros(1024)
>     # (2)
>     for i in range(0,1024):
>         c[i] = a[i] + b[i]
>     # (3)
>     c = a + b
>
> if __name__ == '__main__':
>     i = 0
>     # (1)
>     while i < 500:
>         invoke()
>         i += 1
> ```
>
> Here is the trace of invoke visible in jitviewer (uploaded to
> https://postimg.org/image/kbntluw55/full/).
>
> Here are some questions I have that would really help me to get going:
>
> (1) Is there a better way to get loops hot?

no, not really (you can lower the threshold though, see pypy --jit
help for details, only global)

>
> (2) I cannot really figure out what all those trace/loop parameters are.
> obviously i can guess some but most of them do not really make sense to
> me. Am I missing some settings to be able to display more information
> for the trace?
>     In addition to that I do not really see any chance to generate a
> simd loop for (2). Every array access is (arguably) guarded by an array
> index overflow and I think to skip that check would be invalid.

Not really, no, you can't get a better info, but I can probably explain

Some of the values can be easily promoted to constants (array sizes
for multiplication for example). What you can do for array bound
checks is to unroll this loop e.g. 4 times and do ONE array bound
check instead of 4 (checking x + 4). Actually it's possible to have an
optimization that removes all counters and has ONE number to track the
iteration (it's not that hard if the arrays are similar, you can
precompute if they match).

>
> (3) There is no trace generated for this instruction? Does this
> internally call a c function?

No, the trace follows the loop and does not compile the + (but the +
has it's own loop anyway)

>
> (4) What in our opinion makes sense to vectorize with simd instructions?
> Could you provide some sample loops/code (ranging from simple to complex
> loops)?

I can rewrite this loop in a vectorizable way if you want

>
> I would really appreciate any help and/or push into the right direction.
>
> Best,
> Richard
>
>
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>


More information about the pypy-dev mailing list