[Tutor] Regular expression oddity
spir
denis.spir at free.fr
Sun Nov 23 13:23:22 CET 2008
bob gailer a écrit :
> Emmanuel Ruellan wrote:
>> Hi tutors!
>>
>> While trying to write a regular expression that would split a string
>> the way I want, I noticed a behaviour I didn't expect.
>>
>>
>>>>> re.findall('.?', 'some text')
>>>>>
>> ['s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't', '']
>>
>> Where does the last string, the empty one, come from?
>> I find this behaviour rather annoying: I'm getting one group too many.
>>
> The ? means 0 or 1 occurrence. I think re is matching the null string at
> the end.
>
> Drop the ? and you'll get what you want.
>
> Of course you can get the same thing using list('some text') at lower cost.
>
I find this fully consistent, for your regex means matching
* either any char
* or no char at all
Logically, you first get n chars, then one 'nothing'. Only after that will
parsing be stopped because of end of string. Maybe clearer:
print re.findall('.?', '')
==> ['']
print re.findall('.', '')
==> []
denis
More information about the Tutor
mailing list