a RegEx puzzle

Charles Hartman charles.hartman at conncoll.edu
Fri Mar 11 08:38:36 EST 2005


I'm still shaky on some of sre's syntax. Here's the task: I've got 
strings (never longer than about a dozen characters) that are 
guaranteed to be made only of characters 'x' and '/'. In each string I 
want to find the longest continuous stretch of pairs whose first 
character is 'x' and the second is either mark. So if marks = 
'/xx/xxx///', the "winning" stretch begins at position 2 and is 6 
characters long ('x/xxx/'), which requires finding a second match that 
overlaps the first match (which is just 'xx' in position 1). (When 
there are multiple "winning" stretches, I might want to adjudicate 
among them, but that's a separate problem.) I hope this is clear 
enough.

Here's the best I've come up with so far:

pat = sre.compile('(x[x/])+')
(longest, startlongest) = max([(fnd.end()-fnd.start(), fnd.start()) for 
i in range(len(marks))
			for fnd in pat.finditer(marks,i)])

It seems to work, but it's not very smart. Given the string 
'//////xx//', it returns a list whose first seven tuples are all 
'(2,6)', followed by a single '(2,7)', which gets returned because of 
max's secondary attention to the second element in the tuple. In other 
words this code searches from position 0 and finds (2,6), then searches 
from position 1 and finds (2,6) . . .

Obviously it ought to adjust the second argument to pat.finditer 
according to what it finds, instead of blindly stepping. (By the way, I 
can't give 'range' a stepsize of 2 because the "winning" stretch could 
begin with an odd-numbered character.) That sounds like a job for a 
generator. But (1) I haven't been able to work out the details (I'm 
even shakier on generators than I am on regexes!) and (2) that *seems* 
likely to be less efficient, maybe by a large enough factor that the 
obnoxiousness of getting the first search-return seven times is 
something I should just swallow.

What magic am I missing?

Charles Hartman
Professor of English, Poet in Residence
http://cherry.conncoll.edu/cohar
http://villex.blogspot.com




More information about the Python-list mailing list