[Tutor] Regular expression on python
Peter Otten
__peter__ at web.de
Wed Apr 15 10:24:59 CEST 2015
Alan Gauld wrote:
> On 15/04/15 02:02, Steven D'Aprano wrote:
>>> New one on me. Where does one find out about verbose mode?
>>> I don't see it in the re docs?
>>>
>
>> or embed the flag in the pattern. The flags that I know of are:
>>
>> (?x) re.X re.VERBOSE
>>
>> The flag can appear anywhere in the pattern and applies to the whole
>> pattern, but it is good practice to put them at the front, and in the
>> future it may be an error to put the flags elsewhere.
>
> I've always applied flags as separate params at the end of the
> function call. I've never seen (or noticed?) the embedded form,
> and don't see it described in the docs anywhere (although it
> probably is).
Quoting <https://docs.python.org/dev/library/re.html>:
"""
(?aiLmsux)
(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The
group matches the empty string; the letters set the corresponding flags:
re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent),
re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the
entire regular expression. (The flags are described in Module Contents.)
This is useful if you wish to include the flags as part of the regular
expression, instead of passing a flag argument to the re.compile() function.
Note that the (?x) flag changes how the expression is parsed. It should be
used first in the expression string, or after one or more whitespace
characters. If there are non-whitespace characters before the flag, the
results are undefined.
"""
> But the re module descriptions of the flags only goive the
> re.X/re.VERBOSE options, no mention of the embedded form.
> Maybe you are just supposed to infer the (?x) form from the re.X...
>
> However, that still doesn't explain the difference in your comment
> syntax.
>
> The docs say the verbose syntax looks like:
>
> a = re.compile(r"""\d + # the integral part
> \. # the decimal point
> \d * # some fractional digits""", re.X)
>
> Whereas your syntax is like:
>
> a = re.compile(r"""(?x) (?# turn on verbose mode)
> \d + (?# the integral part)
> \. (?# the decimal point)
> \d * (?# some fractional digits)""")
>
> Again, where is that described?
"""
(?#...)
A comment; the contents of the parentheses are simply ignored.
"""
Let's try it out:
>>> re.compile("\d+(?# sequence of digits)").findall("alpha 123 beta 456")
['123', '456']
>>> re.compile("\d+# sequence of digits").findall("alpha 123 beta 456")
[]
>>> re.compile("\d+# sequence of digits", re.VERBOSE).findall("alpha 123
beta 456")
['123', '456']
So (?#...)-style comments work in non-verbose mode, too, and Steven is
wearing belt and braces (almost, the verbose flag is still necessary to
ignore the extra whitespace).
More information about the Tutor
mailing list