[Tutor] Regular Expression question
Fri Apr 18 14:58:01 2003
On Friday 18 April 2003 11:46, Michael Janssen wrote:
> On Thu, 17 Apr 2003, Scott Chapman wrote:
> > Is it possible to make a regular expression that will match:
> > '<html blah>' or '<html>'
> > without having to make it into two complete expressions seperated by a
> > pipe: r'<html[ \t].+?>|<html>'
> > I want it to require a space or tab and at least one character before the
> > closing bracket, after 'html', or just the closing bracket.
> def test(expr):
> for s in ('<html blah>','<html>', '<html:subtype>','<html >',
> '<html tag1 tag2>'):
> print "%-18s" % s,
> mt = re.search(expr, s)
> if mt:
> print mt.group()
> else: print
> test(r"<html([ \t][^ \t]+?)?>")
> <html blah> <html blah>
> <html> <html>
> <html >
> <html tag1 tag2>
> r"<html([ \t][^ \t]+?)?>" has a group "([ \t][ \t]+?)" for one following
> space-tag-combination. This group can be given once or no times.
> NB: re is powerfull but not suficient for reallife html as Magnus has
> already stated today.
Thanks for the tip. I'm moving forward with this as a first program in
Python. I'll have a good look at htmlParser shortly because of Magnus' post.
I'm just using this as an exercise and doubt it will ever see production.