a RegEx puzzle

Charles Hartman charles.hartman at conncoll.edu
Fri Mar 11 15:52:54 EST 2005


Thanks -- not only for the code, which does almost exactly what I need 
to do, but for the reminder (thanks also to Jeremy Bowers for this!) to 
prefer simple solutions. I was, of course, so tied up in getting my 
nifty one-liner right that I totally lost sight of how 
straightforwardly the job could be done; and now that I've got it, I've 
also got room to tune it. For instance,  your code keeps the first 
"longest" match if several are equal in length; my program will I think 
do slightly better if I keep the last "longest" instead, and changing 
that required changing > into >=, which even I can't screw up.

Thanks to everyone who's helped on this. Makes me wish I were going to 
pycon.

Charles Hartman
Professor of English, Poet in Residence
http://cherry.conncoll.edu/cohar
http://villex.blogspot.com

Kent Johnson wrote:
> It's pretty simple to put re.search() into a loop where subsequent 
> searches start from the character after where the previous one 
> matched. Here is a solution that uses a general-purpose longest match 
> function:
>
> import re
>
> # RE solution
> def longestMatch(rx, s):
>     ''' Find the longest match for rx in s.
>         Returns (start, length) for the match or (None, None) if no 
> match found.
>     '''
>
>     start = length = current = 0
>
>     while True:
>         m = rx.search(s, current)
>         if not m:
>             break
>
>         mStart, mEnd = m.span()
>         current = mStart + 1
>
>         if (mEnd - mStart) > length:
>             start = mStart
>             length = mEnd - mStart
>
>     if length:
>         return start, length
>
>     return None, None
>
>
> pairsRe = re.compile(r'(x[x/])+')
>
> for s in [ '/xx/xxx///', '//////xx//' ]:
>     print s, longestMatch(pairsRe, s)




More information about the Python-list mailing list