How can you copy (clone) a string?

Joal Heagney s713221 at student.gu.edu.au
Thu Oct 5 08:31:13 EDT 2000


Mikael Olofsson wrote:

> On 03-Oct-00 Peter.Rupp at ual.com wrote:
>  >  So, my next thought was to simply create thousands of 1-meg strings on
>  >  the fly using "exec".  I could then determine very accurately when the
>  >  process ran out of memory.  (Of course, I could probably write this
>  >  easily in C, but it would be more hastle than it's worth at this point.
>
> Well... FWIW, on my machine I get the following.
>
> >>> a='a'
> >>> b='a'
> >>> c='a'
> >>> d='a'
> >>> id(a),id(b),id(c),id(d)
> (830696, 830696, 830696, 830696)
>
> But I also get the following.
>
> >>> a=2*'a'
> >>> b=2*'a'
> >>> c=2*'a'
> >>> d=2*'a'
> >>> id(a),id(b),id(c),id(d)
> (878200, 878280, 878320, 878360)
>
> It seems that as long as you choose long enough strings (2), they do
> wind up in different memory locations.
>
> Someone with deeper knowledge can probably explain why, and tell you
> whether you can depend on this behaviour.

*remembering something in "Learning Python"* The first 100 numbers (0 to 99)
and the alphanumeric characters, are "stored" in the interpreter as predefined
objects. Whenever you create a variable that refers to these objects, the
interpreter just creates a reference to them. You're example works just as
well with:
>>> a = 1
>>> b = 1
>>> c = 1
>>> d = 1
>>> id(a), id(b), id(c), id(d)
(134882168, 134882168, 134882168, 134882168)
>>> a = 99
>>> b = 99
>>> c = 99
>>> d = 99
>>> id(a), id(b), id(c), id(d)
(134881268, 134881268, 134881268, 134881268)
However, with
>>> a = 2*'a'
>>> b = 2*'a'
>>> c = 2*'a'
>>> d = 2*'a'
>>> id(a), id(b), id(c), id(d)
(135044976, 135045024, 135045088, 135045152)
and
>>> a = 100
>>> b = 100
>>> c = 100
>>> d = 100
>>> id(a), id(b), id(c), id(d)
(134881220, 134881208, 134881256, 134881244)
the interpreter creates a new object for each. However, I don't understand it
all, because when you just create the strings as follows:
>>> a = 'aa'
>>> b = 'aa'
>>> c = 'aa'
>>> d = 'aa'
>>> id(a), id(b), id(c), id(d)
(134879088, 134879088, 134879088, 134879088)
*shrugs*. I'd like to understand what's going on here. From a first guess, I'd
say that in simple assignment the interpreter hunts to see if a string has
been pre-defined and if so adds a reference to it (As strings are immutable,
this allows a speed up, as the only way to change a variable's string value is
to assign a new string to it). The different 2*'a' result must (In a
semi-beginner's guess) be due to the interpeter creating a new object for each
one and dropping 2 times 'a' into it? *shrugs* Somebody with more knowledge
please enlighten. Strings are immutable, aren't they?

Joal Heagney/Ancient Hart




More information about the Python-list mailing list