Strange behavior in string interpolation of constants
Ned Batchelder
ned at nedbatchelder.com
Mon Oct 16 20:18:22 EDT 2017
On 10/16/17 7:39 PM, מיקי מונין wrote:
> Hello, I am working on an article on python string formatting. As a part of
> the article I am researching the different forms of python string
> formatting.
>
> While researching string interpolation(i.e. the % operator) I noticed
> something weird with string lengths.
>
> Given two following two functions:
>
> def simple_interpolation_constant_short_string():
> return "Hello %s" % "World!"
>
> def simple_interpolation_constant_long_string():
> return "Hello %s. I am a very long string used for research" % "World!"
>
>
> Lets look at the bytecode generated by them using the dis module
>
> The first example produces the following bytecode:
> 9 0 LOAD_CONST 3 ('Hello World!')
> 2 RETURN_VALUE
>
> It seems very normal, it appears that the python compiler optimizes the
> constant and removes the need for the string interpolation
>
> However the output of the second function caught my eye:
>
> 12 0 LOAD_CONST 1 ('Hello %s. I am a very long
> string used for research')
> 2 LOAD_CONST 2 ('World!')
> 4 BINARY_MODULO
> 6 RETURN_VALUE
>
> This was not optimized by the compiler! Normal string interpolation was
> used!
>
> Based on some more testing it appears that for strings that would result in
> more than 20 characters no optimization is done, as evident by these
> examples:
>
> def expected_result():
> return "abcdefghijklmnopqrs%s" % "t"
>
> Bytecode:
> 15 0 LOAD_CONST 3 ('abcdefghijklmnopqrst')
> 2 RETURN_VALUE
>
> def abnormal_result():
> return "abcdefghijklmnopqrst%s" % "u"
>
> Bytecode:
>
> 18 0 LOAD_CONST 1 ('abcdefghijklmnopqrst%s')
> 2 LOAD_CONST 2 ('u')
> 4 BINARY_MODULO
> 6 RETURN_VALUE
>
> I am using Python 3.6.3
> I am curios as to why this happens. Can anyone shed further light on this
> behaviour?
Optimizers have plenty of heuristics. This one seems to avoid the
constant folding if the string is larger than 20. The code seems to
bear this out
(https://github.com/python/cpython/blob/master/Python/peephole.c#L305):
} else if (size > 20) {
Py_DECREF(newconst);
return -1;
}
As to why they chose 20? There's no clue in the code, and I don't know.
--Ned.
More information about the Python-list
mailing list