[Python-Dev] Slice as a copy... by design?

Mon May 26 22:59:27 CEST 2008

If I may, I'd like to describe the D programming language's slice and
array semantics. This may seem like a digression (and it probably is),
but this thread is already quite lengthy, and it may be interesting to
look at a language which implements the slice-by-reference semantics
we're talking about here.

If you're unfamiliar with D, here are its relevant features: It is
statically typed, it is garbage collected, it has dynamic arrays as a
builtin type, and strings are simply arrays of characters. (In fact,
strings are slightly more complicated than this, as D is fairly
Unicode-aware, but this is beside the point.) It also styles itself as
a systems programming language, meaning you can play directly with
pointers if you wish to.

A dynamic array in D is a sort of fat pointer, which looks like this:

struct array {
   size_t length;
   void* ptr;
}

That is, it consists of two words, which are passed around by value.
The language makes no distinction between an array and a slice of an
array. They are both just a pointer and a length. In D, which is
garbage collected, any pointer into a GC-allocated chunk of memory
will keep that memory alive. Here is some code (its meaning should not
be hard to divine):

int[] a = [1, 2, 3, 4, 5];
int[] b = a[2..4];

Here we have allocated a dynamic array on the heap, and assigned a
slice over the whole array to 'a'. Then we have assigned a sub-slice
of the array to 'b'. The important point is that slicing is a very
cheap operation in D. If we alter an element of b, the changes will be
reflected in a:

b[0] = 10;
assert(a[2] == 10);

(Already this is drifting off topic. Arrays in D are mutable, and
strings in Python are not. But please bear with me.) If we get rid of
the original reference to the entire array:

a = null;

Then b will still keep the entire array alive. This is probably the
most common criticism of this behavior. If the original array is quite
large, then even a single tiny slice of it will keep the entire thing
alive. However, these slice semantics can be quite useful.

Imagine an XML parser in D. One way you might implement it is to read
the entire file into a buffer, then parse out the relevant
sub-strings. These sub-strings are slices, and are very cheap to make.
The important point here is that, outside of the original buffer, no
heap allocations of any sort are made, and this can lead to a very
efficient parser. And because this is the language's regular, default
behavior, the code which does this looks very natural. This same
behavior exists for any similar parsing problem, and can be
particularly useful when the application is long-lived, such as an
HTTP server parsing HTTP requests.

While D's slice semantics are generally nice, I do not necessarily
advocate their use in Python, since I don't think there's any good way
to get this behavior in Python. (At least not without radically
changing it in other ways.) It is still a worthy example of how
slice-as-reference semantics can be useful.

On Thu, May 22, 2008 at 8:28 AM, Facundo Batista
<facundobatista at gmail.com> wrote:
> Hi!
>
> A thread in PyAr raised the question that, considering that strings
> are immutable, why a slice of a string is a copy and not a reference
> to a part of that string.
>
> I couldn't answer why, so I'm asking here...Is it because the
> reference counting will be complicated? Is it because it'd be
> inefficient in other way? It's something else? Or is something that
> could be done... but is not done yet?
>
> Thank you very much!
>
> --
> . Facundo
>
> Blog: http://www.taniquetil.com.ar/plog/
> PyAr: http://www.python.org/ar/
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/kirklin.mcdonald%40gmail.com
>