[Tutor] Regular Expression question

Jay Dorsey python@jaydorsey.com
Fri Apr 18 14:48:02 2003

Scott Chapman wrote:

> Is it possible to make a regular expression that will match:
> '<html blah>' or '<html>'
> without having to make it into two complete expressions seperated by a pipe:
>  r'<html[ \t].+?>|<html>'
> I want it to require a space or tab and at least one character before the 
> closing bracket, after 'html', or just the closing bracket.
> Scott

How about

'<html([ \t][^>]+)?>'

 >>> import re
 >>> x = re.compile('<html([ \t][^>]+)?>')
 >>> print x
<_sre.SRE_Pattern object at 0x008B63C0>
 >>> y = '<html>'
 >>> print x.search(y).group()
 >>> z = '<html blah>'
 >>> print x.search(z).group()
<html blah>
 >>> a = '<html blah><test>'
 >>> print x.search(a).group()
<html blah>