[issue23269] Tighten-up search loops in sets
Raymond Hettinger
report at bugs.python.org
Sun Jan 25 21:31:31 CET 2015
Raymond Hettinger added the comment:
Reopening this with a better patch that nicely tightens the inner-loop without an algorithmic change or any downside.
Currently, the entry computation adds a constant (i+j), masks it (&mask) to wrap on overflow, scales it to a table index (<< 4), and then does the key lookup (one memory access). These steps are sequential (not parallizeable).
The patch moves the wrap-on-overflow decision out of the inner loop (adding a single, fast, predictable branch (i + LINEAR_PROBES > mask). The inner-loop then simplifies to an add, fetch, and test (no masking and shifting).
The generated code on Clang is very tight:
LBB29_26:
cmpq $0, (%rsi) # entry->key == NULL
je LBB29_30
incq %rdx # j++
addq $16, %rsi # entry++ (done in parallel with the incq)
cmpq $9, %rdx # j <= LINEAR_PROBES
jbe LBB29_26
On gcc-4.9, the loop is unrolled and we get even better code:
leaq (%rbx,%rsi), %rdx # entry = table[i]
cmpq $0, (%rdx) # entry.key == NULL
je L223
leaq 16(%rbx,%rsi), %rdx # entry = table[i+1];
cmpq $0, (%rdx) # entry.key == NULL
je L223
leaq 32(%rbx,%rsi), %rdx
cmpq $0, (%rdx)
je L223
----------
resolution: rejected ->
status: closed -> open
Added file: http://bugs.python.org/file37857/tight2.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23269>
_______________________________________
More information about the Python-bugs-list
mailing list