[Tutor] about Regular Expression

Karl Pflästerer sigurd@12move.de
Sat Jun 21 15:20:02 2003


On 21 Jun 2003, Abdirizak abdi <- a_abdi406@yahoo.com wrote:

> Hi, I was trying to sset up a regular expression that covers the following
> numbers:

> 6868 8901
> (02) 6868 8901

> this is the regular expression set up to cover:

> tee = re.compile(r'\b\(?\d?\d?\)?\s?\d+\d+\d+\d+\s\d+\d+\d+\d+\b')
> tuu = "(02) 8750 9529"
> tii = tee.findall(tuu)
> print tii

> MY PROBLEM is it is displaying this output:

> [ ' 02) 8750 9529']

> it misses the first bracket but don't know why:

Bob wrote why that happens (a word boundary exists at places where you
have at one side a word constituent char and on the other side a non
word constituent char (eg. `a('; between `a' and `(' exists a word
boundary).

But something other.  You could write your regexp a little bit shorter
and IMO clearer.  With
sre.compile(r'(?:\(\d{0,2}\)\s)?\d{4,}\s\d{4,}\b')
you achieve the same and it's easier for the reader to see what you
meant.
`(?:..)' is a non grouping operator (sometimes called shy)
`\d{4,}' means 4 or more digits

You could even write:
sre.compile(r"""
(?:\(\d{0,2}\)\s)?   # (02)  
\d{4,}\s\d{4,}\b""", # 1234 5678
sre.X)               # extended

or
sre.compile(r"""
(?:\(\d{0,2}\)\s)?   # (12)  | (1) | () |
\d{4,}\s\d{4,}\b""", # 1234 5678 | 12345678 123456789
sre.VERBOSE)         # extended

if you like it more clearer.  Here with that simple example it's no big
difference but with more complexly regexps it helps a lot.


   Karl
-- 
 `Beware the Jabberwock, my son!
     The jaws that bite, the claws that catch!
  Beware the Jubjub bird, and shun
    The frumious Bandersnatch!'   "Lewis Carroll" "Jabberwocky"