how to get all repeated group with regular expression
MRAB
google at mrabarnett.plus.com
Fri Nov 21 12:00:56 EST 2008
Steve Holden wrote:
> Please keep this on the list.
>
> scsoce wrote:
>> Steve Holden wrote:
>>> scsoce wrote:
>>>
>>>> say, when I try to search and match every char from variable length
>>>> string, such as string '123456', i tried re.findall( r'(\d)*, '12346' )
>>>>
>>> I think you will find you missed a quote out there. Always better to
>>> copy and paste ...
>>>
>>>
>>>> , but only get '6' and Python doc indeed say: "If a group is contained
>>>> in a part of the pattern that matched multiple times, the last match is
>>>> returned."
>>>>
>>> So use
>>>
>>> r'(\d*)'
>>>
>>> instead and then the group includes all the digits you match.
>>>
>>>
>>>> cause the regx engine cannot remember all the past history then ? is it
>>>> nature to all regx engine or only to Python ?
>>>>
>>> Different regex engines have different capabilities, so I can't speak to
>>> them all. If you wanted *all* the matches of *all* groups, how would you
>>> have them returned? As a list? That would make the case where there was
>>> only one match much tricker to handle. And what would you do with
>>>
>>> r'((\w)*\d)*)'
>>>
>>> Also, what about named groups? I can see enough potential implementation
>>> issues that I can perfectly understand why Python works the way it does,
>>> so I'd be interested to know why it doesn't makes sense to you, and what
>>> you would prefer it to do.
>>>
>>> regards
>>> Steve
>>>
>> maybe my expression was not clear. I want to capture every matched part
>> in a repeated pattern, not only the last, say, for string '123456', I
>> want to back reference any one char, not only the '6'. and i know the
>> example is very simple, so we can got the whole string using regx and
>> get every char using other python statements, but if the pattern in
>> group is complex?
>> and I test in VIM, it can do the 'back reference':
>> ==you text in vim:
>> 123456
>> == pattern:
>> :%s/\(\d\)*/$2
>> text will turn to be:
>> 2
>>
> 'Fraid the Python re implementers just decided not to do it that way.
>
Nor Perl.
Probably what you want is re.findall(r"(\d)", "123456"), which returns a
list of what it captured.
More information about the Python-list
mailing list