[Python-3000] Making more effective use of slice objects in Py3k

Guido van Rossum guido at python.org
Sat Aug 26 18:30:57 CEST 2006


Can you explain in a sentence or two how these changes would be
*used*? Your code examples don't speak for themselves (maybe because
It's Saturday morning :-). Short examples of something clumsy and/or
slow that we'd have to write today compared to something fast and
elegant that we could write after the change woulde be quite helpful.
The exact inheritance relationship between slice and [x]range seems a
fairly uninteresting details in comparison.

--Guido

On 8/26/06, Nick Coghlan <ncoghlan at iinet.net.au> wrote:
> This idea is inspired by the find/rfind string discussion (particularly a
> couple of comments from Jim and Ron), but I think the applicability may prove
> to be wider than just string methods (e.g. I suspect it may prove useful for
> the bytes() type as well).
>
> Copy-on-slice semantics are by far the easiest semantics to deal with in most
> cases, as they result in the fewest nasty surprises. However, they have one
> obvious drawback: performance can suffer badly when dealing with large
> datasets (copying 10 MB chunks of memory around can take a while!).
>
> There are a couple of existing workarounds for this: buffer() objects, and the
> start/stop arguments to a variety of string methods. Neither of these is
> particular convenient to work with, and buffer() is slated to go away in Py3k.
>
> I think an enriched slicing model that allows sequence views to be expressed
> easily as "this slice of this sequence" would allow this to be dealt with
> cleanly, without requiring every sequence to provide a corresponding "sequence
> view" with non-copying semantics. I think Guido's concern that people will
> reach for string views when they don't need them is also valid (as I believe
> that it is most often inexperience that leads to premature optimization that
> then leads to needless code complexity).
>
> The specific changes I suggest based on the find/rfind discussion are:
>
>    1. make range() (what used to be xrange()) a subclass of slice(), so that
> range objects can be used to index sequences. The only differences between
> range() and slice() would then be that start/stop/step will never be None for
> range instances, and range instances act like an immutable sequence while
> slice instances do not (i.e. range objects would grow an indices() method).
>
>    2. change range() and slice() to accept slice() instances as arguments so
> that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError
> if x.stop is None).
>
>    3. change API's that currently accept start/stop arguments (like string
> methods) to accept a single slice() instance instead (possibly raising
> ValueError if step != 1).
>
>    4. provide an additional string method partition_indices() that returns 3
> range() objects instead of 3 new strings
>
> The new method would have semantics like:
>
>    def partition_indices(self, sep, limits=None):
>        if limits is None:
>            limits = range(0, len(self))
>        else:
>            limits = limits.indices(len(self))
>        try:
>            idxsep = self.index(sep, limits)
>        except ValueError:
>            return limits, range(0), range(0)
>        endsep = idxsep + len(sep)
>        return (range(limits.start, idxsep),
>                range(idxsep, endsep),
>                range(endsep, limits.stop))
>
> With partition() itself being equivalent to:
>
>      def partition(self, sep, subseq=None):
>          before, sep, after = self.partition_indices(sep, subseq)
>          return self[before], self[sep], self[after]
>
> Finally, an efficient partition based implementation of the example from
> Walter that started the whole discussion about views and the problem with
> excessive copying would look like:
>
> def splitpartition_indices(s):
>       rest = range(len(s))
>       while 1:
>           prefix, lbrace, rest = s.partition_indices("{", rest)
>           first, space, rest = s.partition_indices(" ", rest)
>           second, rbrace, rest = s.partition_indices("}", rest)
>           if prefix:
>               yield (None, s[prefix])
>           if not (lbrace and space and rbrace):
>               break
>           yield (s[first], s[second])
>
> (I know the above misses a micro-optimization, in that it calls partition
> again on an empty subsequence, even if space or lbrace are False. I believe
> doing the three partition calls together makes it much easier to read, and
> searching an empty string is pretty quick).
>
> For comparison, here's the normal copying version that has problems scaling to
> large strings:
>
> def splitpartition(s):
>       rest = s
>       while 1:
>           prefix, lbrace, rest = rest.partition_indices("{")
>           first, space, rest = rest.partition_indices(" ")
>           second, rbrace, rest = rest.partition_indices("}")
>           if prefix:
>               yield (None, prefix)
>           if not (lbrace and space and rbrace):
>               break
>           yield (first, second)
>
> Should I make a Py3k PEP for this?
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list