[pypy-issue] [issue1701] a / b with integers could be faster in the JIT

Carl Friedrich Bolz tracker at bugs.pypy.org
Fri Mar 7 16:40:08 CET 2014


New submission from Carl Friedrich Bolz <cfbolz at gmx.de>:

a / b with integers is currently doing something like:

r = a / b
p = r * y
if y < 0: u = p - x
else:     u = x - p
return r + (u >> INT_BITS_1)

the first two operations could be done with just one IDIV instruction on X86.
For the rest it's maybe better to write it branch-free.


discussion:

16:08:06 <cfbolz> indeed, before rtyping it's just "floordiv"
16:08:22 <arigato> and then "floordiv" is turned into... not "int_floordiv"
16:09:11 <cfbolz> int_truncdiv + a call
16:09:19 <arigato> yes
16:10:33 <cfbolz> arigato: might not get around to it now, but will keep it in mind
16:10:37 <arigato> ok
16:10:50 <arigato> I *think* the current slightly strange situation gives good
results for the jit
16:11:45 <cfbolz> no clue
16:12:13 <arigato> at least I remember when int_floordiv was meant to have
Python semantics too
16:12:54 <arigato> it was more messy in the jit, and then sometimes for nothing
if you really wanted a C-like behavior in the end
16:13:14 <cfbolz> right
16:14:11 <cfbolz> arigato: still, you get thiskind of code:
16:14:17 <cfbolz> https://www.irccloud.com/pastebin/x2jo6KoG
16:15:10 <arigato> right, but one way or another, the RPython "/" operator needs
to translate to this
16:15:35 <cfbolz> arigato: I am worried by the last guard a bit
16:15:45 <arigato> (probably someone with the right mindset can come up with a
different set of instructions that is somehow better)
16:15:57 <arigato> cfbolz: ah, I see
16:15:59 <cfbolz> arigato: because it can really make bridges
16:16:08 <arigato> ok
16:16:48 <arigato> it comes out of ll_correct_int_floordiv()
16:17:00 <cfbolz> can we write this branch free
16:17:19 <arigato> first step is to remember why this works at all :-)
16:17:44 <cfbolz> :-)
16:17:59 <arigato> right, I think the branch was put there specifically because
it's the less likely to have different values at run-time
16:18:18 <arigato> "x/y" where y is sometimes positive and sometimes negative...
is probably a bit rare
16:18:28 <cfbolz> ah, no
16:18:42 <cfbolz> arigato: it's maybe even because then if you know that y is
positive you can constant fold
16:19:00 <arigato> yes, that too
16:19:14 <cfbolz> arigato: because division by a constant factor
16:20:25 <cfbolz> still confused how it works
16:21:09 <arigato> (u >> INT_BITS_1) is -1 if u < 0, or 0 otherwise
16:21:37 <cfbolz> yes
16:21:41 <cfbolz> but what is u?
16:22:22 <arigato> u is computed to be 0 if the division was exact, or < 0 if not
16:22:37 <cfbolz> right
16:23:05 <arigato> a branch-free variant would be: return r + (((x - p) ^ y) >>
INT_BITS_1)
16:23:34 → willingc joined (~willingc at cpe-75-80-19-24.san.res.rr.com)
16:23:43 <cfbolz> arigato: which means we would replace a guard by an xor
16:23:44 <arigato> the trade-off is if we already know that y > 0
16:23:49 <arigato> yes
16:24:20 <arigato> this kind of code would be all constant-folded away, right?
if jit.isconstant(y > 0):...
16:24:21 <cfbolz> no clue :-(
16:24:41 <cfbolz> why all?
16:24:49 <cfbolz> only the guard would go away, no?
16:24:54 → Wessie joined (~Wessie at ip5651e009.adsl-surfen.hetnet.nl)
16:25:05 <arigato> yes yes, I mean the "jit.isconstant()" part
16:25:24 <cfbolz> yes
16:25:28 <arigato> but it's probably not going to ever say True
16:25:38 <arigato> y is known to be positive later
16:25:42 <arigato> not during tracing
16:25:48 <cfbolz> yes, e.g. if it's loop invariant
16:26:01 <arigato> hum
16:26:16 <arigato> the alternative is to add a correct (but very special-case)
arithmetic optmization:
16:26:33 <arigato> i3 = i1 ^ i2; i4 = i3 >> INT_BITS_1
16:26:55 <arigato> if i2 is known to be positive, then we can substitute i3 for
i1 in the computation of i4
16:27:07 <cfbolz> arigato: very special case :-)
16:27:15 <cfbolz> arigato: hm, in x86 assembler you could indeed do better,
because IDIV computes both x / y and the remainder in one instruction, no?
16:27:25 <arigato> ah, yes
16:27:27 <cfbolz> so you wouldn't need the multiply
16:27:38 ⇐ zer0def quit (~null at 5.254.138.19) Ping timeout: 240 seconds
16:27:39 <arigato> indeed
16:28:13 <cfbolz> so we would need an int_divrem (which we can't have, because
it returns two things)
16:28:34 <arigato> it's probably not a problem:
16:29:10 <arigato> we can arrange things such that in the end the jit backend
sees a INT_TRUNCDIV immediately followed by a INT_REM
16:29:35 <cfbolz> I see
16:30:56 <arigato> (btw, same about 'mod' vs 'int_mod')
16:31:03 <cfbolz> yes

----------
messages: 6577
nosy: cfbolz, pypy-issue
priority: performance bug
status: unread
title: a / b with integers could be faster in the JIT

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1701>
________________________________________


More information about the pypy-issue mailing list