Why don't strings share data in Python?

Chris Gonnerman chris.gonnerman at newcenturycomputers.net
Tue Apr 16 00:21:59 EDT 2002


----- Original Message -----
From: "Mike Coleman" <mkc+dated+1021521407.f909ec at mathdogs.com>


> Does anyone know why strings (i.e., those of length >1) don't share their
data
> in Python?  Since their immutable, it seems like this would be the obvious
> thing to do.  So, for example, the space behavior of this code could be
linear
> rather than quadratic/horrific:
>
> d = {}
> for i in xrange(100000000):
>     d[mybigstring[i:]] = mybigstring[i:]

You, the human, have an implicit semantic understanding that the RHS
of the assignment above is the same as the key in the LHS expression.
The compiler knows no such thing; so it computes the substring twice
(for each loop pass).  You are, in effect, asking for it.

If you mean, why don't substrings share space with parent strings,
hmm, that's an interesting question.  I can see that a string object
implemented something like this struct:

    struct ExampleString {
        int len;
        char *buf;
    }

would allow that, and it sounds like an interesting optimization.  If
the string object is like this:

    struct AnotherExampleString {
        int len;
        char buf[1];
    }

(where the instances are overallocated to contain the data) would not
allow it.

Are you certain Python doesn't do it?  Can you submit a patch to
implement it?  (PEP + Patch == Implemented Change in many cases).

Chris Gonnerman -- chris.gonnerman at newcenturycomputers.net
http://newcenturycomputers.net






More information about the Python-list mailing list