Hi Richard. I will respond inline On Tue, Feb 24, 2015 at 8:18 PM, Richard Plangger <rich@pasra.at> wrote:
hi,
i'm currently trying to find a way to generate simd code for python loops that operate on numpy arrays. The IRC is quite busy so I think I'd rather post it as an email...
I have familiarized myself with pypy and read most of the docs and read the code in the metainterp to understand how the traces are built.
I translated pypy with --withmod-micronumpy enabled (installed numpy pypy fork in an virtual env) and gave the jitviewer a try. Here is a sample program that does not really make sense, but I think it would contain opportunity to generate SIMD instructions.
``` import numpy def invoke(): a = numpy.zeros(1024) b = numpy.zeros(1024) c = numpy.zeros(1024) # (2) for i in range(0,1024): c[i] = a[i] + b[i] # (3) c = a + b
if __name__ == '__main__': i = 0 # (1) while i < 500: invoke() i += 1 ```
Here is the trace of invoke visible in jitviewer (uploaded to https://postimg.org/image/kbntluw55/full/).
Here are some questions I have that would really help me to get going:
(1) Is there a better way to get loops hot?
no, not really (you can lower the threshold though, see pypy --jit help for details, only global)
(2) I cannot really figure out what all those trace/loop parameters are. obviously i can guess some but most of them do not really make sense to me. Am I missing some settings to be able to display more information for the trace? In addition to that I do not really see any chance to generate a simd loop for (2). Every array access is (arguably) guarded by an array index overflow and I think to skip that check would be invalid.
Not really, no, you can't get a better info, but I can probably explain Some of the values can be easily promoted to constants (array sizes for multiplication for example). What you can do for array bound checks is to unroll this loop e.g. 4 times and do ONE array bound check instead of 4 (checking x + 4). Actually it's possible to have an optimization that removes all counters and has ONE number to track the iteration (it's not that hard if the arrays are similar, you can precompute if they match).
(3) There is no trace generated for this instruction? Does this internally call a c function?
No, the trace follows the loop and does not compile the + (but the + has it's own loop anyway)
(4) What in our opinion makes sense to vectorize with simd instructions? Could you provide some sample loops/code (ranging from simple to complex loops)?
I can rewrite this loop in a vectorizable way if you want
I would really appreciate any help and/or push into the right direction.
Best, Richard
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev