[Tutor] Regular expressions question

Albert-Jan Roskam fomcl at yahoo.com
Thu Dec 6 10:53:08 CET 2012


_______________________________
>From: eryksun <eryksun at gmail.com>
>To: Ed Owens <eowens0124 at gmx.com> 
>Cc: "tutor at python.org" <tutor at python.org> 
>Sent: Thursday, December 6, 2012 3:08 AM
>Subject: Re: [Tutor] Regular expressions question
>
>On Wed, Dec 5, 2012 at 7:13 PM, Ed Owens <eowens0124 at gmx.com> wrote:
>>>>> str(string)
>> '[<div class="wx-timestamp">\n<div class="wx-subtitle wx-timestamp">Updated:
>> Dec 5, 2012, 5:08pm EST</div>\n</div>]'
>>>>> m = re.search('":\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', str(string))
>>>>> print m
>> None
>
>You need a raw string for the boundary marker \b (i.e the boundary
>between \w and \W), else it creates a backspace control character.
>Also, I don't see why you have ": at the start of the expression. This
>works:
>
>    >>> s = 'Updated: Dec 5, 2012, 5:08pm EST</div>'
>    >>> m = re.search(r'\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', s)
>    >>> m.group(1)
>    'Dec 5, 2012, 5:08pm EST'

Lately I started using named groups (after I didn't understand some of my own regexes I wrote several months earlier).
The downside is that the regexes easily get quite long, but one could use the re.VERBOSE flag to make it more readable.
m = re.search(r'\b(?P<date>\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', s)
>>> m.group("date")
'Dec 5, 2012, 5:08pm EST'



More information about the Tutor mailing list