[Python-3000] characters data type

Fredrik Lundh fredrik at pythonware.com
Wed May 3 13:52:28 CEST 2006


Michael Chermside wrote:

> No argument here with regard to strings implemented as trees, but I
> think you may be needlessly worried about physical vs logical copies
> for slices. Since strings (and slices of strings) are immutable, the
> implementation is quite simple. Read the Java "String" class to see
> just how easy. The slice returns a subclass of str that stores a
> start and stop position but redirects data access to the buffer used
> by the original str. The only tricky part is to manage garbage
> collection, solved by having the slice object contain a reference to
> the original str. That's it.
>
> Of course, you knew that, but the fact that I can describe it fully
> in 2 sentences should help show it's not overly complex.

you missed the part about slicing slices, and the bit about what heuristics
to use (if any) to use slicing under the hood also for non-explicit slicing
operations (e.g. should s[:-1] really make a copy?  when can s[i] return
a slice?  etc.  some kind of "temporary" indicator provided by the com-
piler would be excellent for under-the-hood optimizations like this...).

(the original Unicode implementation support explicit slicing, and the
code wasn't even fully refactored when that support was removed,
so you may still find traces of this in the existing source code...  most
notably, the Unicode string type still uses a separate block for the
character data)

</F> 





More information about the Python-3000 mailing list