[issue38015] inline function generates slightly inefficient machine code
Ma Lin
report at bugs.python.org
Tue Sep 3 00:02:44 EDT 2019
New submission from Ma Lin <malincns at 163.com>:
Commit 5e63ab0 replaces macro with this inline function:
static inline int
is_small_int(long long ival)
{
return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;
}
(by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257)
However, when invoking this function, and `sizeof(value) < sizeof(long long)`, there is an unnecessary type casting.
For example, on 32-bit platform, if `value` is `Py_ssize_t`, it needs to be converted to 8-byte `long long` type.
The following assembly code is the beginning part of `PyLong_FromSsize_t(Py_ssize_t v)` function.
(32-bit x86 build generated by GCC 9.2, with `-m32 -O2` option)
Use macro before commit 5e63ab0:
mov eax, DWORD PTR [esp+4]
add eax, 5
cmp eax, 261
ja .L2
sal eax, 4
add eax, OFFSET FLAT:small_ints
add DWORD PTR [eax], 1
ret
.L2: jmp PyLong_FromSsize_t_rest(int)
Use inlined function:
push ebx
mov eax, DWORD PTR [esp+8]
mov edx, 261
mov ecx, eax
mov ebx, eax
sar ebx, 31
add ecx, 5
adc ebx, 0
cmp edx, ecx
mov edx, 0
sbb edx, ebx
jc .L7
cwde
sal eax, 4
add eax, OFFSET FLAT:small_ints+80
add DWORD PTR [eax], 1
pop ebx
ret
.L7: pop ebx
jmp PyLong_FromSsize_t_rest(int)
On 32-bit x86 platform, 8-byte `long long` is implemented in using two registers, so the machine code is much longer than macro version.
At least these hot functions are suffered from this:
PyObject* PyLong_FromSsize_t(Py_ssize_t v)
PyObject* PyLong_FromLong(long v)
Replacing the inline function with a macro version will fix this:
#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)
If you want to see assembly code generated by major compilers, you can paste attached file demo.c to https://godbolt.org/
- demo.c was original written by Greg Price.
- use `-m32 -O2` to generate 32-bit build.
----------
components: Interpreter Core
files: demo.c
messages: 351052
nosy: Greg Price, Ma Lin, aeros167, mark.dickinson, rhettinger, sir-sigurd
priority: normal
severity: normal
status: open
title: inline function generates slightly inefficient machine code
versions: Python 3.9
Added file: https://bugs.python.org/file48583/demo.c
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue38015>
_______________________________________
More information about the Python-bugs-list
mailing list