Help for a complex RE
Peter Otten
__peter__ at web.de
Sun May 8 12:15:25 EDT 2016
Sergio Spina wrote:
> In the following ipython session:
>
>> Python 3.5.1+ (default, Feb 24 2016, 11:28:57)
>> Type "copyright", "credits" or "license" for more information.
>>
>> IPython 2.3.0 -- An enhanced Interactive Python.
>>
>> In [1]: import re
>>
>> In [2]: patt = r""" # the match pattern is:
>> ...: .+ # one or more characters
>> ...: [ ] # followed by a space
>> ...: (?=[@#D]:) # that is followed by one of the
>> ...: # chars "@#D" and a colon ":"
>> ...: """
>>
>> In [3]: pattern = re.compile(patt, re.VERBOSE)
>>
>> In [4]: m = pattern.match("Jun at i Bun#i @:Janji")
>>
>> In [5]: m.group()
>> Out[5]: 'Jun at i Bun#i '
>>
>> In [6]: m = pattern.match("Jun at i Bun#i @:Janji D:Banji")
>>
>> In [7]: m.group()
>> Out[7]: 'Jun at i Bun#i @:Janji '
>>
>> In [8]: m = pattern.match("Jun at i Bun#i @:Janji D:Banji #:Junji")
>>
>> In [9]: m.group()
>> Out[9]: 'Jun at i Bun#i @:Janji D:Banji '
>
> Why the regex engine stops the search at last piece of string?
> Why not at the first match of the group "@:"?
> What can it be a regex pattern with the following result?
>
>> In [1]: m = pattern.match("Jun at i Bun#i @:Janji D:Banji #:Junji")
>>
>> In [2]: m.group()
>> Out[2]: 'Jun at i Bun#i '
Compare:
>>> re.compile("a+").match("aaaa").group()
'aaaa'
>>> re.compile("a+?").match("aaaa").group()
'a'
By default pattern matching is "greedy" -- the ".+" part of your regex
matches as many characters as possible. Adding a ? like in ".+?" triggers
non-greedy matching.
More information about the Python-list
mailing list