[Python-ideas] string codes & substring equality

Sat Nov 30 10:08:12 CET 2013

On 11/30/2013 01:32 AM, Terry Reedy wrote:
> On 11/29/2013 1:40 AM, spir wrote:
>> On 11/29/2013 01:45 AM, Steven D'Aprano wrote:
>>> Here is my benchmark:
>>>
>>> py> from timeit import Timer
>>> py> setup = "s = 'abcdef'"
>>> py> t1 = Timer("ord('c')")  # establish a base-mark of calling ord
>>> py> t2 = Timer("ord(s[2])", setup)
>>> py> min(t1.repeat(repeat=5))
>>> 0.139258101582527
>>> py> min(t2.repeat(repeat=5))
>>> 0.2207092922180891
>>
>> You are right, Steven, the benefit is far tinier than I supposed. I
>> reproduced this on my machine: the time for char-creation + ord() is
>> about 3/2 of ord() alone (which is just indexing).
>>
>> Now, there is a mystery: how is the time for creating a single-char
>> string object about half the time of a simple indexing (in C!)?
>
> Much of the time for ord('c') is the time to make a function call from Python.
> 's[2]' only involves a internal C function call, which is much faster.

Right, that's it, certainly! Thank you very much, Terry.
Is there something else I should know about Python's internal repr for strings? 
For instance, is there an optimisation for (very) short strings? (like, codes 
stored in place in the struct)
Also, I read somewhere, I guess it was in a wikipedia article about string 
interning in a pool, that Python does that; which surprised me pretty much. 
Which strings, if any, are interned in Python (I'd bet, for lookup speed, 
__dict__ key, meaning id's, meaning var & attr names)?

Denis