[Tutor] Regular Expression question

Michael Janssen Janssen@rz.uni-frankfurt.de
Fri Apr 18 14:47:07 2003


On Thu, 17 Apr 2003, Scott Chapman wrote:

> Is it possible to make a regular expression that will match:
> '<html blah>' or '<html>'
> without having to make it into two complete expressions seperated by a pipe:
>  r'<html[ \t].+?>|<html>'
>
> I want it to require a space or tab and at least one character before the
> closing bracket, after 'html', or just the closing bracket.

def test(expr):
    for s in ('<html blah>','<html>', '<html:subtype>','<html >',
              '<html tag1 tag2>'):
        print "%-18s" % s,
        mt = re.search(expr, s)
        if mt:
            print mt.group()
        else: print


test(r"<html([ \t][^ \t]+?)?>")
<html blah>        <html blah>
<html>             <html>
<html:subtype>
<html >
<html tag1 tag2>

r"<html([ \t][^ \t]+?)?>" has a group "([ \t][ \t]+?)" for one following
space-tag-combination. This group can be given once or no times.

NB: re is powerfull but not suficient for reallife html as Magnus has
already stated today.

Michael

>
> Scott
>
>
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>