[Python-Dev] Slice as a copy... by design?

Mon May 26 14:16:10 CEST 2008

On Mon, May 26, 2008 at 4:21 AM, Hrvoje Nikšić <hrvoje.niksic at avl.com> wrote:
> On Thu, 2008-05-22 at 13:27 -0300, Facundo Batista wrote:
>> 2008/5/22 Scott Dial <scott+python-dev at scottdial.com>:
>>
>> > If we changed Python to slice-by-reference, then tomorrow someone would be
>> > asking why it isn't slice-by-copy. There are pros and cons to both that are
>>
>> Which are the cons of slice-by-reference of an immutable string?
>
> You have to consider the ramifications of such a design choice.  There
> are two cases to consider: either slices return strings, or they return
> a different types.
>
> If they return strings, then all strings must grow three additional
> fields: start, end, and the reference to the actual string.  That is 16
> more bytes for *every* string, hardly a net win.

A lot of dynamic language implementations have a complex string
representation, where individual bits of the string tell what the rest
of the representation is.  Mozilla's JavaScript implementation is like
this.  At the moment, a string in JavaScript is two pointer-sized
words, and JavaScript has O(1) slicing and, in many cases, O(len(s2))
string concatenation.

There's a rather dense comment here explaining it:
http://hg.mozilla.org/mozilla-central/index.cgi/file/79924d3b5bba/js/src/jsstr.h

The equivalent of PyString_AS_STRING and PyString_GET_SIZE contains a
branch.  I don't think the implementation avoids the worst cases Guido
was talking about; tiny substrings can keep huge strings alive.

-j