[issue13136] speed-up conversion between unicode widths
meadori at gmail.com
Sun Oct 9 02:49:48 CEST 2011
On Sat, Oct 8, 2011 at 5:34 PM, Antoine Pitrou <report at bugs.python.org> wrote:
> Antoine Pitrou <pitrou at free.fr> added the comment:
>> Before going further with this, I'd suggest you have a look at your
>> compiler settings.
> They are set by the configure script:
> gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall
> -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o
> Objects/unicodeobject.o Objects/unicodeobject.c
>> Such optimizations are normally performed by the
>> compiler and don't need to be implemented in C, making maintenance
> The fact that the glibc includes such optimization (in much more
> sophisticated form) suggests to me that many compilers don't perform
> these optimizations automically.
I agree. This is more of an optimized runtime library problem than
code optimization problem.
>> I tested using memchr() when writing those "naive" loops.
> memchr() is mentioned in another issue, #13134.
Yeah, this conversation is really more relevant to issue13134, but I will
reply to these here anyway ....
>> is inlined by the compiler just like the direct loop
> I don't think so. If you look at the glibc's memchr() implementation,
> it's a sophisticated routine, not a trivial loop. Perhaps you're
> thinking about memcpy().
Without link-time optimization enabled, I doubt the toolchain can
in the traditional sense (i.e. inserting the body of the routine
inline). Even if it could,
the inline heuristics would most likely choose not to. I don't think we use LTO
with GCC, but I think we might with VC++.
GCC does something else called builtin folding that is more likely.
'memchr ("bca", 'c', 3)' gets replace with instructions to compute a pointer
index into "bca". This won't happen in this case because all of the 'memchr'
arguments are all variable.
>> and the generated
>> code for the direct version is often easier to optimize for the compiler
>> than the memchr() one, since it receives more knowledge about the used
>> data types.
> ?? Data types are fixed in the memchr() definition, there's no knowledge
> to be gained by inlining.
I think what Marc-Andre is alluding to is that the first parameter of
'memchr' is 'void *'
which could (in theory) limit optimization opportunities. Where as if
it knew that
the data being searched is a 'char *' or something it could take
advantage of that.
More information about the Python-bugs-list