[issue23269] Tighten-up search loops in sets

Raymond Hettinger report at bugs.python.org
Sun Jan 25 21:31:31 CET 2015


Raymond Hettinger added the comment:

Reopening this with a better patch that nicely tightens the inner-loop without an algorithmic change or any downside.

Currently, the entry computation adds a constant (i+j), masks it (&mask) to wrap on overflow, scales it to a table index (<< 4), and then does the key lookup (one memory access).   These steps are sequential (not parallizeable).

The patch moves the wrap-on-overflow decision out of the inner loop (adding a single, fast, predictable branch (i + LINEAR_PROBES > mask).   The inner-loop then simplifies to an add, fetch, and test (no masking and shifting).

The generated code on Clang is very tight:

LBB29_26:      
    cmpq    $0, (%rsi)       # entry->key == NULL
    je  LBB29_30
    incq    %rdx             # j++
    addq    $16, %rsi        # entry++  (done in parallel with the incq)
    cmpq    $9, %rdx         # j <= LINEAR_PROBES
    jbe LBB29_26

On gcc-4.9, the loop is unrolled and we get even better code:

    leaq    (%rbx,%rsi), %rdx   # entry = table[i]
    cmpq    $0, (%rdx)          # entry.key == NULL
    je  L223

    leaq    16(%rbx,%rsi), %rdx # entry = table[i+1];
    cmpq    $0, (%rdx)          # entry.key == NULL
    je  L223

    leaq    32(%rbx,%rsi), %rdx
    cmpq    $0, (%rdx)
    je  L223

----------
resolution: rejected -> 
status: closed -> open
Added file: http://bugs.python.org/file37857/tight2.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23269>
_______________________________________


More information about the Python-bugs-list mailing list