
If I may, I'd like to describe the D programming language's slice and array semantics. This may seem like a digression (and it probably is), but this thread is already quite lengthy, and it may be interesting to look at a language which implements the slice-by-reference semantics we're talking about here. If you're unfamiliar with D, here are its relevant features: It is statically typed, it is garbage collected, it has dynamic arrays as a builtin type, and strings are simply arrays of characters. (In fact, strings are slightly more complicated than this, as D is fairly Unicode-aware, but this is beside the point.) It also styles itself as a systems programming language, meaning you can play directly with pointers if you wish to. A dynamic array in D is a sort of fat pointer, which looks like this: struct array { size_t length; void* ptr; } That is, it consists of two words, which are passed around by value. The language makes no distinction between an array and a slice of an array. They are both just a pointer and a length. In D, which is garbage collected, any pointer into a GC-allocated chunk of memory will keep that memory alive. Here is some code (its meaning should not be hard to divine): int[] a = [1, 2, 3, 4, 5]; int[] b = a[2..4]; Here we have allocated a dynamic array on the heap, and assigned a slice over the whole array to 'a'. Then we have assigned a sub-slice of the array to 'b'. The important point is that slicing is a very cheap operation in D. If we alter an element of b, the changes will be reflected in a: b[0] = 10; assert(a[2] == 10); (Already this is drifting off topic. Arrays in D are mutable, and strings in Python are not. But please bear with me.) If we get rid of the original reference to the entire array: a = null; Then b will still keep the entire array alive. This is probably the most common criticism of this behavior. If the original array is quite large, then even a single tiny slice of it will keep the entire thing alive. However, these slice semantics can be quite useful. Imagine an XML parser in D. One way you might implement it is to read the entire file into a buffer, then parse out the relevant sub-strings. These sub-strings are slices, and are very cheap to make. The important point here is that, outside of the original buffer, no heap allocations of any sort are made, and this can lead to a very efficient parser. And because this is the language's regular, default behavior, the code which does this looks very natural. This same behavior exists for any similar parsing problem, and can be particularly useful when the application is long-lived, such as an HTTP server parsing HTTP requests. While D's slice semantics are generally nice, I do not necessarily advocate their use in Python, since I don't think there's any good way to get this behavior in Python. (At least not without radically changing it in other ways.) It is still a worthy example of how slice-as-reference semantics can be useful. On Thu, May 22, 2008 at 8:28 AM, Facundo Batista <facundobatista@gmail.com> wrote:
Hi!
A thread in PyAr raised the question that, considering that strings are immutable, why a slice of a string is a copy and not a reference to a part of that string.
I couldn't answer why, so I'm asking here...Is it because the reference counting will be complicated? Is it because it'd be inefficient in other way? It's something else? Or is something that could be done... but is not done yet?
Thank you very much!
-- . Facundo
Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kirklin.mcdonald%40gmail.c...