[Tutor] about Regular Expression
Karl Pflästerer
sigurd@12move.de
Sat Jun 21 15:20:02 2003
On 21 Jun 2003, Abdirizak abdi <- a_abdi406@yahoo.com wrote:
> Hi, I was trying to sset up a regular expression that covers the following
> numbers:
> 6868 8901
> (02) 6868 8901
> this is the regular expression set up to cover:
> tee = re.compile(r'\b\(?\d?\d?\)?\s?\d+\d+\d+\d+\s\d+\d+\d+\d+\b')
> tuu = "(02) 8750 9529"
> tii = tee.findall(tuu)
> print tii
> MY PROBLEM is it is displaying this output:
> [ ' 02) 8750 9529']
> it misses the first bracket but don't know why:
Bob wrote why that happens (a word boundary exists at places where you
have at one side a word constituent char and on the other side a non
word constituent char (eg. `a('; between `a' and `(' exists a word
boundary).
But something other. You could write your regexp a little bit shorter
and IMO clearer. With
sre.compile(r'(?:\(\d{0,2}\)\s)?\d{4,}\s\d{4,}\b')
you achieve the same and it's easier for the reader to see what you
meant.
`(?:..)' is a non grouping operator (sometimes called shy)
`\d{4,}' means 4 or more digits
You could even write:
sre.compile(r"""
(?:\(\d{0,2}\)\s)? # (02)
\d{4,}\s\d{4,}\b""", # 1234 5678
sre.X) # extended
or
sre.compile(r"""
(?:\(\d{0,2}\)\s)? # (12) | (1) | () |
\d{4,}\s\d{4,}\b""", # 1234 5678 | 12345678 123456789
sre.VERBOSE) # extended
if you like it more clearer. Here with that simple example it's no big
difference but with more complexly regexps it helps a lot.
Karl
--
`Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!' "Lewis Carroll" "Jabberwocky"