Getting happier ;-), but wondering if I'm thinking pythonically

Steven Taschuk staschuk at telusplanet.net
Mon May 26 03:03:54 EDT 2003


Quoth Brian Quinlan:
  [...]
> 8. I'd write find_len (semantics changed) something like this:
  [...]
> It is a bit shorter, more generic, removes an exception check and defers
> an exception handling decision to a higher level. 

All to the good, though there are a few simple bugs in the posted
implementation.  Here's a corrected version:

    def find_delimited_end(s, start_delimiter, end_delimiter,
                           opening_count=0):
        for i in range(len(s)):
            c = s[i]
            if c == start_delimiter:
                opening_count += 1
            elif c == end_delimiter:       # fixed
                if opening_count == 1:     # fixed; but see [2] below
                    return i
                opening_count -= 1         # fixed
        raise ValueError('unmatched delimiters')

If there are typically many non-delimiter characters between
delimiters, then a performance improvement is possible by moving
the loops over the string down to C:

    def find_delimited_end(s, start_delimiter, end_delimiter,
                           opening_count=0):
        start = 0
        while True:
            end = s.find(end_delimiter, start)
            if end < 0:
                raise ValueError('unmatched delimiters')
            opening_count += s.count(start_delimiter, start, end+1)
            opening_count -= s.count(end_delimiter, start, end+1)
            if opening_count == 0:     # see [2] below
                return end
            start = end+1

On my machine, this is a little slower for '{{}}', but about five
times faster [1] for
    '{s{s}s}'.replace('s', 'abcdefghijklmnopqrstuvwxyz')
even though it makes three traversals over each part of the string.

[1] Five times faster under 2.2.2; under 2.3b1 the first version
speeds up by a factor of about 1.5, so the gap is reduced.

[2] Neither version detects the erroneous case in which closing
delimiters occur before the first opening delimiter.  In context
this happens not to matter, though it really ought to be fixed.

-- 
Steven Taschuk                                     staschuk at telusplanet.net
Receive them ignorant; dispatch them confused.  (Weschler's Teaching Motto)





More information about the Python-list mailing list