[New-bugs-announce] [issue38015] inline function generates slightly inefficient machine code

Ma Lin report at bugs.python.org
Tue Sep 3 00:02:44 EDT 2019

New submission from Ma Lin <malincns at 163.com>:

Commit 5e63ab0 replaces macro with this inline function:

    static inline int
    is_small_int(long long ival)
        return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;

(by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257)

However, when invoking this function, and `sizeof(value) < sizeof(long long)`, there is an unnecessary type casting.

For example, on 32-bit platform, if `value` is `Py_ssize_t`, it needs to be converted to 8-byte `long long` type.

The following assembly code is the beginning part of `PyLong_FromSsize_t(Py_ssize_t v)` function.
(32-bit x86 build generated by GCC 9.2, with `-m32 -O2` option)

Use macro before commit 5e63ab0:
        mov     eax, DWORD PTR [esp+4]
        add     eax, 5
        cmp     eax, 261
        ja      .L2
        sal     eax, 4
        add     eax, OFFSET FLAT:small_ints
        add     DWORD PTR [eax], 1
.L2:    jmp     PyLong_FromSsize_t_rest(int)

Use inlined function:
        push    ebx
        mov     eax, DWORD PTR [esp+8]
        mov     edx, 261
        mov     ecx, eax
        mov     ebx, eax
        sar     ebx, 31
        add     ecx, 5
        adc     ebx, 0
        cmp     edx, ecx
        mov     edx, 0
        sbb     edx, ebx
        jc      .L7
        sal     eax, 4
        add     eax, OFFSET FLAT:small_ints+80
        add     DWORD PTR [eax], 1
        pop     ebx
.L7:    pop     ebx
        jmp     PyLong_FromSsize_t_rest(int)

On 32-bit x86 platform, 8-byte `long long` is implemented in using two registers, so the machine code is much longer than macro version.

At least these hot functions are suffered from this:
  PyObject* PyLong_FromSsize_t(Py_ssize_t v)
  PyObject* PyLong_FromLong(long v)

Replacing the inline function with a macro version will fix this:
#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)

If you want to see assembly code generated by major compilers, you can paste attached file demo.c to https://godbolt.org/
- demo.c was original written by Greg Price.
- use `-m32 -O2` to generate 32-bit build.

components: Interpreter Core
files: demo.c
messages: 351052
nosy: Greg Price, Ma Lin, aeros167, mark.dickinson, rhettinger, sir-sigurd
priority: normal
severity: normal
status: open
title: inline function generates slightly inefficient machine code
versions: Python 3.9
Added file: https://bugs.python.org/file48583/demo.c

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list