[Tutor] Regular expression oddity

Sun Nov 23 13:23:22 CET 2008

bob gailer a écrit :
> Emmanuel Ruellan wrote:
>> Hi tutors!
>>
>> While trying to write a regular expression that would split a string
>> the way I want, I noticed a behaviour I didn't expect.
>>
>>  
>>>>> re.findall('.?', 'some text')
>>>>>         
>> ['s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't', '']
>>
>> Where does the last string, the empty one, come from?
>> I find this behaviour rather annoying: I'm getting one group too many.
>>   
> The ? means 0 or 1 occurrence. I think re is matching the null string at 
> the end.
> 
> Drop the ? and you'll get what you want.
> 
> Of course you can get the same thing using list('some text') at lower cost.
> 
I find this fully consistent, for your regex means matching
* either any char
* or no char at all
Logically, you first get n chars, then one 'nothing'. Only after that will 
parsing be stopped because of end of string. Maybe clearer:
print re.findall('.?', '')
==> ['']
print re.findall('.', '')
==> []
denis