[Python-3000] Droping find/rfind?

Sat Aug 26 10:02:15 CEST 2006

Ron Adam wrote:
> Nick Coghlan wrote:
>> Fredrik Lundh wrote:
>>> Nick Coghlan wrote:
>>>
>>>>> Nick Coghlan wrote:
>>>>>
>>>>>> With a variety of "view types", that work like the corresponding builtin type,
>>>>>> but reference the original data structure instead of creating copies
>>>>> support for string views would require some serious interpreter surgery, though,
>>>>> and probably break quite a few extensions...
>>>> Why do you say that?
>>> because I happen to know a lot about how Python's string types are
>>> implemented ?
>> I believe you're thinking about something far more sophisticated than what I'm 
>> suggesting. I'm just talking about a Python data type in a standard library 
>> module that trades off slower performance with smaller strings (due to extra 
>> method call overhead) against improved scalability (due to avoidance of 
>> copying strings around).
>>
>>>> make a view of it
>>> so to make a view of a string, you make a view of it ?
>> Yep - by using all those "start" and "stop" optional arguments to builtin 
>> string methods to implement the methods of a string view in pure Python. By 
>> creating the string view all you would really be doing is a partial 
>> application of start and stop arguments on all of the relevant string methods.
>>
>> I've included an example below that just supports __len__, __str__ and 
>> partition(). The source object survives for as long as the view does - the 
>> idea is that the view should only last while you manipulate the string, with 
>> only real strings released outside the function via return statements or yield 
>> expressions.
> 
> 
>    >>>  self.source = "%s" % source
> 
> I think this should be.
> 
>     self.source = source
> 
> Other wise you are making copies of the source which is what you
> are trying to avoid.  I'm not sure if python would reuse the self.source 
> string, but I wouldn't count on it.

CPython 2.5 certainly doesn't reuse the existing string object. Given that 
what I wrote is the way to ensure you have a builtin string type (str or 
unicode) without coercing actual unicode objects to str objects or vice-versa, 
it should probably be subjected to the same optimisation as the str() and 
unicode() constructors (i.e., simply increfing and returning the original 
builtin string).

> It might be nice if slice objects could be used in more ways in python. 
> That may work in most cases where you would want a string view.

That's quite an interesting idea. With that approach, rather than having to 
duplicate 'concrete sequence with copying semantics' and 'sequence view with 
non-copying semantics' everywhere, you could just provide methods on objects 
that returned the appropriate slice objects representing the location of 
relevant sections, rather than copies of the sections themselves.

To make that work effectively, you'd need to implement __nonzero__ on slice 
objects as "((self.stop - self.start) // self.step) > 0" (Either that or 
implement __len__, which would contribute to making slice() look more and more 
like xrange(), as someone else noted recently).

Using the same signature as partition:

    def partition_indices(self, sep, start=None, stop=None):
        if start is None: start = 0
        if stop is None: stop = len(s)
        try:
            idxsep = self.index(sep, start, stop)
        except ValueError:
            return slice(start, stop), slice(0), slice(0)
        endsep = idxsep + len(sep)
        return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, stop)

Then partition() itself would be equivalent to:

    def partition(self, sep, start=None, stop=None):
        before, sep, after = self.partition_indices(sep, start, stop)
        return self[before], self[sep], self[after]

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org