regexp matching end of line or comma

MRAB python at mrabarnett.plus.com
Thu Nov 25 12:07:53 EST 2010


On 25/11/2010 16:26, Jean-Michel Pichavant wrote:
> MRAB wrote:
>> On 25/11/2010 14:40, Jean-Michel Pichavant wrote:
>>> Hy guys,
>>>
>>> I'm struggling matching patterns ending with a comma ',' or an end of
>>> line '$'.
>>>
>>> import re
>>>
>>> ex1 = 'sumthin,'
>>> ex2 = 'sumthin'
>>> m1 = re.match('(?P<something>\S+),', ex1)
>>> m2 = re.match('(?P<something>\S+)$', ex2)
>>> m3 = re.match('(?P<something>\S+)[,$]', ex1)
>>> m4 = re.match('(?P<something>\S+)[,$]', ex2)
>>>
>>> print m1, m2
>>> print m3
>>> print m4
>>>
>>> <_sre.SRE_Match object at 0x8834de0> <_sre.SRE_Match object at
>>> 0x8834e20>
>>> <_sre.SRE_Match object at 0x8834e60>
>>> None
>>>
>>> My problem is that m4 is None while I'd like it to match ex2.
>>>
>>> Any clue ?
>>>
>> Within a character set '$' is a literal '$' and not end-of-string, just
>> as '\b' is '\x08' and not word-boundary.
>>
>> Use a lookahead instead:
>>
>> >>> re.match('(?P<something>\S+)(?=,|$)', ex1)
>> <_sre.SRE_Match object at 0x01719FA0>
>> >>> re.match('(?P<something>\S+)(?=,|$)', ex2)
>> <_sre.SRE_Match object at 0x016937E0>
> thanks, it works that way.
> By the way I don't get the difference between non capturing parentesis
> (?:) and lookahead parenthesis (?=):
>
> re.match('(?P<something>\S+)(?:,|$)', ex2).groups()
> ('sumthin',)
>
> re.match('(?P<something>\S+)(?=,|$)', ex2).groups()
> ('sumthin',)
>
A non-capturing parenthesis 'consumes' characters; a lookahead
parenthesis doesn't, so another part of the regex can match it again.

I suppose that in this instance it doesn't matter!



More information about the Python-list mailing list