when do two names cease to refer to the same string object?

Steven D'Aprano steve at REMOVETHIScyber.com.au
Fri Mar 3 12:19:09 CET 2006

On Thu, 02 Mar 2006 20:45:10 -0500, John Salerno wrote:

> To test this out a wrote a little script as an exercise:
> for num in range(1, 10):
> 	x = 'c' * num
> 	y = 'c' * num
> 	if x is y:
> 	   print 'x and y are the same object with', num, 'characters'
> 	else:
> 	   print 'x and y are not the same object at', num, 'characters'
> 	   break
> But a few questions arise:
> 1. As it is above, I get the 'not the same object' message at 2 
> characters. But doesn't Python only create one instance of small strings 
> and use them in multiple references? Why would a two character string 
> not pass the if test?

Watch this:

>>> "aaaaaa" is "aaaaaa"
>>> "aaaaaa" is "aaa" + "aaa"

Does this give you a hint as to what is happening?

Some more evidence:

>>> "aaaaa"[0:1] is "aaaaa"[0:1]
>>> "aaaaa"[0:2] is "aaaaa"[0:2]

> 2. If I say x = y = 'c' * num instead of the above, the if test always 
> returns true. Does that mean that when you do a compound assignment like 
> that, it's not just setting each variable to the same value, but setting 
> them to each other?

Yes. Both x and y will be bound to the same object, not just two objects
with the same value. This is not an optimization for strings, it is a
design decision for all objects:

>>> x = y = []
>>> x.append(1)
>>> y

> Finally, I'd like to see how others might write a script to do this 
> exercise. 

filename = "string_optimization_tester.py"
s = "if '%s' is not '%s':\n    raise ValueError('stopped at n=%d')\n"
f = file(filename, "w")
for n in range(1000):
    f.write(s % ("c"*n, "c"*n, n))

f.write("""if 'ccc' is not 'c'*3:
    print 'Expected failure failed correctly'
    print 'Expected failure did not happen'
f.write("print 'Done!'\n")


More information about the Python-list mailing list