[Python-3000] Making more effective use of slice objects in Py3k

Mon Aug 28 01:38:08 CEST 2006

On 8/27/06, Guido van Rossum <guido at python.org> wrote:
> On 8/26/06, Jim Jewett <jimjjewett at gmail.com> wrote:

> > For example, you wanted to keep the rarely used optional arguments to
> > find because of efficiency.

> I don't believe they are rarely used. They are (currently) essential
> for code that searches a long string for a short substring repeatedly.
> If you believe that is a rare use case, why bother coming up with a
> whole new language feature to support it?

I believe that a fair amount of code already does the copying inline;
suppporting it in the runtime means that copying code becomes more
efficient, and shortcutting code becomes less unusual.

> > If slices were less eager at copying, this could be
> > rewritten as

> >     view=slice(start, stop, 1)
> >     view(s).find(prefix)

> Now you're postulating that calling a slice will take a slice of an
> object?

Yes.

> Any object? And how is that supposed to work for arbitrary
> objects?

For non-iterables, it will raise a TypeError.

> I would think that it ought to be a method on the string
> object

Restricting it to a few types including string might make sense.

> Also you're postulating that the slice object somehow has the
> same methods as the thing it slices?

Rather, the value returned by calling the slice on a specific string.
(I tend to think of this as a "slice of" the string, but as you've
pointed out, "slice object" technically refers to the object
specifying how/where to cut.)

> How are you expecting to implement that?

I had expected to implement it as a (string) view, which is why I
don't quite understand the distinction Nick and Josiah are making.

> But this assumes that string views are 99.999% indiscernible from
> regular strings

Yes; instead of assuming that a string's data starts n bytes after the
object's own pointer, it will instead be located at a (possibly zero)
offset.  No visible difference to python code; the difference between
-> and . for C code.  (And this indirection is already used by unicode
objects.)

> That will never fly. NumPy may get away with non-copying slices, but
> for built-in objects this would be too big of a departure of current
> practice. (If you don't stop about this I'll have to add it to PEP
> 3099. :-)

That's unfortunate, but if you're sure, maybe it should go in PEP 3099.

> > Yes, this does risk keeping all of data alive because one chunk was
> > saved.  This might be a reasonable tradeoff to avoid the copying.  If
> > not, perhaps the gc system could be augmented to shrink bloated views
> > during idle moments.

> Keep dreaming on. it really seems you have no clue about
> implementation issues; you just keep postulating random solutions
> whenever you're faced with an objection.

I had thought the problem was more about whether or not it was a good
idea; the tradeoff might be OK, or at least less bad than the
complication of fixing it.

As one implementation of fixing it, in today's garbage collection,
http://svn.python.org/view/python/trunk/Modules/gcmodule.c?rev=46244&view=markup
function collect, surviving objects are moved to the next generation
with gc_list_merge(young, old); before merging, the young list could
be traversed, and any object whose type has a __condense__ method
would get it called.  The strview type's __condense__ method would be
the C equivalent of

    if len(self.src) <= 200:
        return  # Src object too small to be worth recovering
    if (len(self) * refcounts(src)) >= len(self.src):
        return  # Src object used enough to be worth keeping
    self.src=str(src) # Create a new data buffer, with no extra chars.

(Sent in python because the commented C was several times as long,
even before checking with compiler.)  As to whether a __condense
method is a good idea, whether it should really be tied that closely
to garbage collection, whether it should be limited to C
implementations ... that I'm not so sure of.

-jJ