[issue13136] speed-up conversion between unicode widths

Martin v. Löwis report at bugs.python.org
Sun Oct 9 05:06:10 CEST 2011


Martin v. Löwis <martin at v.loewis.de> added the comment:

Marc-Andre: gcc will normally not unroll loops, unless -funroll-loops is given on the command line. Then, it will unroll many loops, and do so with 8 iterations per outer loop. This typically causes significant code bloat, which is why unrolling is normally disabled and left to the programmer.

For those who want to experiment with this, I attach a C file with just the code in question. Compile this with your favorite compiler settings, and see what the compile generates. clang, on an x64 system, compiles the original loop into


LBB0_2:                                 ## =>This Inner Loop Header: Depth=1
        movzbl  (%rdi), %eax
        movw    %ax, (%rdx)
        incq    %rdi
        addq    $2, %rdx
        decq    %rsi
        jne     LBB0_2

and the unrolled loop into

LBB1_2:                                 ## %.lr.ph6
                                        ## =>This Inner Loop Header: Depth=1
        movzbl  (%rdi,%rcx), %r8d
        movw    %r8w, (%rdx)
        movzbl  1(%rdi,%rcx), %r8d
        movw    %r8w, 2(%rdx)
        movzbl  2(%rdi,%rcx), %r8d
        movw    %r8w, 4(%rdx)
        movzbl  3(%rdi,%rcx), %r8d
        movw    %r8w, 6(%rdx)
        addq    $8, %rdx
        addq    $4, %rcx
        cmpq    %rax, %rcx
        jl      LBB1_2

----------
nosy: +loewis
Added file: http://bugs.python.org/file23353/unroll.c

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13136>
_______________________________________


More information about the Python-bugs-list mailing list