[Tutor] regular expression question

A.M. Kuchling amk at amk.ca
Sun Aug 31 09:01:39 EDT 2003


On Sunday, August 31, 2003, at 05:46 AM, nihilo wrote:
> I'm stuck on a regular expression. I want to match everything starting 
> from a word up to and including the next occurence of the word.

Most regular expression tutorials will also discuss grouping and 
backreferences, which are the solution to this problem.  Grouping with 
(...) lets you retrieve the contents of what's
matched by the expression inside the parentheses; backreferences (\1 or 
\g<1>) let you say 'match the contents of group N.

So, for your problem you want:

 >>> p = re.compile(r'(\b\w+\b).*\1')
 >>> m = p.search('get to the heart of the matter')
 >>> m.group()
'the heart of the'
 >>>

(Note the use of a raw string (r'') for the pattern; both \b and \1 get 
interpreted differently and incorrectly without it.)

--amk




More information about the Tutor mailing list