[Tutor] Regular expression on python

Peter Otten __peter__ at web.de
Wed Apr 15 10:24:59 CEST 2015


Alan Gauld wrote:

> On 15/04/15 02:02, Steven D'Aprano wrote:
>>> New one on me. Where does one find out about verbose mode?
>>> I don't see it in the re docs?
>>>
> 
>> or embed the flag in the pattern. The flags that I know of are:
>>
>> (?x) re.X re.VERBOSE
>>
>> The flag can appear anywhere in the pattern and applies to the whole
>> pattern, but it is good practice to put them at the front, and in the
>> future it may be an error to put the flags elsewhere.
> 
> I've always applied flags as separate params at the end of the
> function call. I've never seen (or noticed?) the embedded form,
> and don't see it described in the docs anywhere (although it
> probably is). 

Quoting <https://docs.python.org/dev/library/re.html>:

"""
(?aiLmsux)
(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The 
group matches the empty string; the letters set the corresponding flags: 
re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), 
re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the 
entire regular expression. (The flags are described in Module Contents.) 
This is useful if you wish to include the flags as part of the regular 
expression, instead of passing a flag argument to the re.compile() function.

Note that the (?x) flag changes how the expression is parsed. It should be 
used first in the expression string, or after one or more whitespace 
characters. If there are non-whitespace characters before the flag, the 
results are undefined.
"""

> But the re module descriptions of the flags only goive the
> re.X/re.VERBOSE options, no mention of the embedded form.
> Maybe you are just supposed to infer the (?x) form from the re.X...
> 
> However, that still doesn't explain the difference in your comment
> syntax.
> 
> The docs say the verbose syntax looks like:
> 
> a = re.compile(r"""\d +  # the integral part
>                     \.    # the decimal point
>                     \d *  # some fractional digits""", re.X)
> 
> Whereas your syntax is like:
> 
> a = re.compile(r"""(?x)  (?# turn on verbose mode)
>                     \d +  (?# the integral part)
>                     \.    (?# the decimal point)
>                     \d *  (?# some fractional digits)""")
> 
> Again, where is that described?

"""
(?#...)
A comment; the contents of the parentheses are simply ignored.
"""

Let's try it out:

>>> re.compile("\d+(?# sequence of digits)").findall("alpha 123 beta 456")
['123', '456']
>>> re.compile("\d+# sequence of digits").findall("alpha 123 beta 456")
[]
>>> re.compile("\d+# sequence of digits", re.VERBOSE).findall("alpha 123 
beta 456")
['123', '456']

So (?#...)-style comments work in non-verbose mode, too, and Steven is 
wearing belt and braces (almost, the verbose flag is still necessary to 
ignore the extra whitespace).



More information about the Tutor mailing list