[Python-Dev] [Python-checkins] cpython: In-line the append operations inside deque_inplace_repeat().
Raymond Hettinger
raymond.hettinger at gmail.com
Tue Sep 15 00:37:15 CEST 2015
> On Sep 14, 2015, at 12:49 PM, Brett Cannon <bcannon at gmail.com> wrote:
>
> Would it be worth adding a comment that the block of code is an inlined copy of deque_append()?
> Or maybe even turn the append() function into a macro so you minimize code duplication?
I don't think either would be helpful. The point of the inlining was to let the code evolve independently from deque_append().
Once separated from the mother ship, the code in deque_inline_repeat() could now shed the unnecessary work. The state variable is updated once. The updates within a single block are now in the own inner loop. The deque size is updated outside of that loop, etc. In other words, they are no longer the same code.
The original append-in-a-loop version was already being in-lined by the compiler but was doing way too much work. For each item written in the original, there were 7 memory reads, 5 writes, 6 predictable compare-and-branches, and 5 add/sub operations. In the current form, there are 0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub operations.
FWIW, my work flow is that periodically I expand the code with new features (the upcoming work is to add slicing support http://bugs.python.org/issue17394), then once it is correct and tested, I make a series optimization passes (such as the work I just described above). After that, I come along and factor-out common code, usually with clean, in-lineable functions rather than macros (such as the recent check-in replacing redundant code in deque_repeat with a call to the common code in deque_inplace_repeat).
My schedule lately hasn't given me any big blocks of time to work with, so I do the steps piecemeal as I get snippets of development time.
Raymond
P.S. For those who are interested, here is the before and after:
---- before ---------------------------------
L1152:
movq __Py_NoneStruct at GOTPCREL(%rip), %rdi
cmpq $0, (%rdi) <
je L1257
L1159:
addq $1, %r13
cmpq %r14, %r13
je L1141
movq 16(%rbx), %rsi <
L1142:
movq 48(%rbx), %rdx <
addq $1, 56(%rbx) <>
cmpq $63, %rdx
je L1143
movq 32(%rbx), %rax <
addq $1, %rdx
L1144:
addq $1, 0(%rbp) <>
leaq 1(%rsi), %rcx
movq %rdx, 48(%rbx) >
movq %rcx, 16(%rbx) >
movq %rbp, 8(%rax,%rdx,8) >
movq 64(%rbx), %rax <
cmpq %rax, %rcx
jle L1152
cmpq $-1, %rax
je L1152
---- after ------------------------------------
L777:
cmpq $63, %rdx
je L816
L779:
addq $1, %rdx
movq %rbp, 16(%rsi,%rbx,8) <
addq $1, %rbx
leaq (%rdx,%r9), %rcx
subq %r8, %rcx
cmpq %r12, %rbx
jl L777
# outside the inner-loop
movq %rdx, 48(%r13)
movq %rcx, 0(%rbp)
cmpq %r12, %rbx
jl L780
More information about the Python-Dev
mailing list