[Python-checkins] [Python-Dev] cpython: In-line the append operations inside deque_inplace_repeat().

Raymond Hettinger raymond.hettinger at gmail.com
Tue Sep 15 00:37:15 CEST 2015


> On Sep 14, 2015, at 12:49 PM, Brett Cannon <bcannon at gmail.com> wrote:
> 
> Would it be worth adding a comment that the block of code is an inlined copy of deque_append()?
> Or maybe even turn the append() function into a macro so you minimize code duplication?

I don't think either would be helpful.  The point of the inlining was to let the code evolve independently from deque_append().   

Once separated from the mother ship, the code in deque_inline_repeat() could now shed the unnecessary work.  The state variable is updated once.  The updates within a single block are now in the own inner loop. The deque size is updated outside of that loop, etc.   In other words, they are no longer the same code.

The original append-in-a-loop version was already being in-lined by the compiler but was doing way too much work.  For each item written in the original, there were 7 memory reads, 5 writes, 6 predictable compare-and-branches, and 5 add/sub operations.  In the current form, there are 0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub operations.

FWIW, my work flow is that periodically I expand the code with new features (the upcoming work is to add slicing support http://bugs.python.org/issue17394), then once it is correct and tested, I make a series optimization passes (such as the work I just described above).  After that, I come along and factor-out common code, usually with clean, in-lineable functions rather than macros (such as the recent check-in replacing redundant code in deque_repeat with a call to the common code in deque_inplace_repeat).

My schedule lately hasn't given me any big blocks of time to work with, so I do the steps piecemeal as I get snippets of development time.


Raymond


P.S. For those who are interested, here is the before and after:

---- before ---------------------------------
L1152:
    movq    __Py_NoneStruct at GOTPCREL(%rip), %rdi
    cmpq    $0, (%rdi)                                   <
    je  L1257
L1159:
    addq    $1, %r13
    cmpq    %r14, %r13
    je  L1141
    movq    16(%rbx), %rsi                               <
L1142:
    movq    48(%rbx), %rdx                               <
    addq    $1, 56(%rbx)                                 <>
    cmpq    $63, %rdx
    je  L1143
    movq    32(%rbx), %rax                               <
    addq    $1, %rdx
L1144:
    addq    $1, 0(%rbp)                                  <>
    leaq    1(%rsi), %rcx
    movq    %rdx, 48(%rbx)                                >
    movq    %rcx, 16(%rbx)                                >
    movq    %rbp, 8(%rax,%rdx,8)                          >
    movq    64(%rbx), %rax                               <
    cmpq    %rax, %rcx
    jle L1152
    cmpq    $-1, %rax
    je  L1152


---- after ------------------------------------
L777:
    cmpq    $63, %rdx
    je  L816
L779:
    addq    $1, %rdx
    movq    %rbp, 16(%rsi,%rbx,8)                <
    addq    $1, %rbx
    leaq    (%rdx,%r9), %rcx
    subq    %r8, %rcx
    cmpq    %r12, %rbx
    jl  L777

    # outside the inner-loop
    movq    %rdx, 48(%r13)                  
    movq    %rcx, 0(%rbp)
    cmpq    %r12, %rbx
    jl  L780


More information about the Python-checkins mailing list