[Python-3000] Making more effective use of slice objects in Py3k

Jim Jewett jimjjewett at gmail.com
Mon Aug 28 01:38:08 CEST 2006

On 8/27/06, Guido van Rossum <guido at python.org> wrote:
> On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:

> > For example, you wanted to keep the rarely used optional arguments to
> > find because of efficiency.

> I don't believe they are rarely used. They are (currently) essential
> for code that searches a long string for a short substring repeatedly.
> If you believe that is a rare use case, why bother coming up with a
> whole new language feature to support it?

I believe that a fair amount of code already does the copying inline;
suppporting it in the runtime means that copying code becomes more
efficient, and shortcutting code becomes less unusual.

> > If slices were less eager at copying, this could be
> > rewritten as

> >     view=slice(start, stop, 1)
> >     view(s).find(prefix)

> Now you're postulating that calling a slice will take a slice of an
> object?


> Any object? And how is that supposed to work for arbitrary
> objects?

For non-iterables, it will raise a TypeError.

> I would think that it ought to be a method on the string
> object

Restricting it to a few types including string might make sense.

> Also you're postulating that the slice object somehow has the
> same methods as the thing it slices?

Rather, the value returned by calling the slice on a specific string.
(I tend to think of this as a "slice of" the string, but as you've
pointed out, "slice object" technically refers to the object
specifying how/where to cut.)

> How are you expecting to implement that?

I had expected to implement it as a (string) view, which is why I
don't quite understand the distinction Nick and Josiah are making.

> But this assumes that string views are 99.999% indiscernible from
> regular strings

Yes; instead of assuming that a string's data starts n bytes after the
object's own pointer, it will instead be located at a (possibly zero)
offset.  No visible difference to python code; the difference between
-> and . for C code.  (And this indirection is already used by unicode

> That will never fly. NumPy may get away with non-copying slices, but
> for built-in objects this would be too big of a departure of current
> practice. (If you don't stop about this I'll have to add it to PEP
> 3099. :-)

That's unfortunate, but if you're sure, maybe it should go in PEP 3099.

> > Yes, this does risk keeping all of data alive because one chunk was
> > saved.  This might be a reasonable tradeoff to avoid the copying.  If
> > not, perhaps the gc system could be augmented to shrink bloated views
> > during idle moments.

> Keep dreaming on. it really seems you have no clue about
> implementation issues; you just keep postulating random solutions
> whenever you're faced with an objection.

I had thought the problem was more about whether or not it was a good
idea; the tradeoff might be OK, or at least less bad than the
complication of fixing it.

As one implementation of fixing it, in today's garbage collection,
function collect, surviving objects are moved to the next generation
with gc_list_merge(young, old); before merging, the young list could
be traversed, and any object whose type has a __condense__ method
would get it called.  The strview type's __condense__ method would be
the C equivalent of

    if len(self.src) <= 200:
        return  # Src object too small to be worth recovering
    if (len(self) * refcounts(src)) >= len(self.src):
        return  # Src object used enough to be worth keeping
    self.src=str(src) # Create a new data buffer, with no extra chars.

(Sent in python because the commented C was several times as long,
even before checking with compiler.)  As to whether a __condense
method is a good idea, whether it should really be tied that closely
to garbage collection, whether it should be limited to C
implementations ... that I'm not so sure of.


More information about the Python-3000 mailing list