[Tutor] Regular expressions question
eryksun
eryksun at gmail.com
Thu Dec 6 03:08:51 CET 2012
On Wed, Dec 5, 2012 at 7:13 PM, Ed Owens <eowens0124 at gmx.com> wrote:
>>>> str(string)
> '[<div class="wx-timestamp">\n<div class="wx-subtitle wx-timestamp">Updated:
> Dec 5, 2012, 5:08pm EST</div>\n</div>]'
>>>> m = re.search('":\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', str(string))
>>>> print m
> None
You need a raw string for the boundary marker \b (i.e the boundary
between \w and \W), else it creates a backspace control character.
Also, I don't see why you have ": at the start of the expression. This
works:
>>> s = 'Updated: Dec 5, 2012, 5:08pm EST</div>'
>>> m = re.search(r'\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', s)
>>> m.group(1)
'Dec 5, 2012, 5:08pm EST'
But wouldn't it be simpler and more reliable to use an HTML parser?
More information about the Tutor
mailing list