[Python-ideas] string codes & substring equality

Fri Nov 29 07:40:12 CET 2013

On 11/29/2013 01:45 AM, Steven D'Aprano wrote:
> Here is my benchmark:
>
> py> from timeit import Timer
> py> setup = "s = 'abcdef'"
> py> t1 = Timer("ord('c')")  # establish a base-mark of calling ord
> py> t2 = Timer("ord(s[2])", setup)
> py> min(t1.repeat(repeat=5))
> 0.13925810158252716
> py> min(t2.repeat(repeat=5))
> 0.2207092922180891

You are right, Steven, the benefit is far tinier than I supposed. I reproduced 
this on my machine: the time for char-creation + ord() is about 3/2 of ord() 
alone (which is just indexing).

Now, there is a mystery: how is the time for creating a single-char string 
object about half the time of a simple indexing (in C!)? This just cannot be, 
can it? Even if there is no alloc [1]. Or is it so that there is a cache, maybe 
temporary, for such recently created objects? (The char 'c' would be created 
only once, then just accessed.)

Denis

[1] Don't know of Python arcanes, but small strings do not always require alloc, 
for their data can be stored in place, in the object's "facade" struct.