[Python-3000] Droping find/rfind?

Ron Adam rrr at ronadam.com
Fri Aug 25 20:59:46 CEST 2006


Nick Coghlan wrote:
> Fredrik Lundh wrote:
>> Nick Coghlan wrote:
>>
>>>> Nick Coghlan wrote:
>>>>
>>>>> With a variety of "view types", that work like the corresponding builtin type,
>>>>> but reference the original data structure instead of creating copies
>>>> support for string views would require some serious interpreter surgery, though,
>>>> and probably break quite a few extensions...
>>> Why do you say that?
>> because I happen to know a lot about how Python's string types are
>> implemented ?
> 
> I believe you're thinking about something far more sophisticated than what I'm 
> suggesting. I'm just talking about a Python data type in a standard library 
> module that trades off slower performance with smaller strings (due to extra 
> method call overhead) against improved scalability (due to avoidance of 
> copying strings around).
> 
>>> make a view of it
>> so to make a view of a string, you make a view of it ?
> 
> Yep - by using all those "start" and "stop" optional arguments to builtin 
> string methods to implement the methods of a string view in pure Python. By 
> creating the string view all you would really be doing is a partial 
> application of start and stop arguments on all of the relevant string methods.
> 
> I've included an example below that just supports __len__, __str__ and 
> partition(). The source object survives for as long as the view does - the 
> idea is that the view should only last while you manipulate the string, with 
> only real strings released outside the function via return statements or yield 
> expressions.


   >>>  self.source = "%s" % source

I think this should be.

    self.source = source

Other wise you are making copies of the source which is what you
are trying to avoid.  I'm not sure if python would reuse the self.source 
string, but I wouldn't count on it.


It might be nice if slice objects could be used in more ways in python. 
That may work in most cases where you would want a string view.

An example of a slice version of partition would be:  (not tested)

   def slice_partition(s, sep, sub_slice=None):
     if sub_slice is None:
        sub_slice = slice(len(s))
     found_slice = find_slice(s, sep, sub_slice)
     prefix_slice = slice(sub_slice.start, found_slice.start)
     rest_slice = slice(found_slice.stop, sub_slice.stop)
     return ( prefix_slice,
              found_slice,
              rest_slice )

   # implementation of find_slice left to readers.
   def find_slice(s, sub, sub_slice=None):
      ...
      return found_slice

Of course this isn't needed for short strings, but might be worth while 
when used with very long strings.



> # Simple string view example
> class strview(object):
>      def __new__(cls, source, start=None, stop=None):
>          self = object.__new__(cls)
>          self.source = "%s" % source
>          self.start = start if start is not None else 0
>          self.stop = stop if stop is not None else len(source)
>          return self
>      def __str__(self):
>          return self.source[self.start:self.stop]
>      def __len__(self):
>          return self.stop - self.start
>      def partition(self, sep):
>          _src = self.source
>          try:
>              startsep = _src.index(sep, self.start, self.stop)
>          except ValueError:
>              # Separator wasn't found!
>              return self, _NULL_STR, _NULL_STR
>          # Return new views of the three string parts
>          endsep = startsep + len(sep)
>          return (strview(_src, self.start, startsep),
>                  strview(_src, startsep, endsep),
>                  strview(_src, endsep, self.stop))
> 
> _NULL_STR = strview('')
> 
> def splitview(s):
>       rest = strview(s)
>       while 1:
>           prefix, found, rest = rest.partition("{")
>           if prefix:
>               yield (None, str(prefix))
>           if not found:
>               break
>           first, found, rest = rest.partition(" ")
>           if not found:
>               break
>           second, found, rest = rest.partition("}")
>           if not found:
>               break
>           yield (str(first), str(second))
> 
>  >>> list(splitview('foo{spam eggs}bar{foo bar}'))
> [(None, 'foo'), ('spam', 'eggs'), (None, 'bar'), ('foo', 'bar')]




More information about the Python-3000 mailing list