No explanation for weird behavior in re module!
Tim Peters
tim.one at home.com
Sun Feb 10 19:21:22 EST 2002
[synthespian]
> >>> import re
> >>> p = re.compile('^(der|die|das(\s\w+))')
> >>> m = p.match('die Tür, Türen')
> >>> n = p.match('das Auto, Autos')
> >>> m.group(0)
> 'die'
> >>> m.group(1)
> 'die'
> >>> m.group(2)
> [nothing!!!!]
> >>> n.group(0)
> 'das Auto'
> >>> n.group(1)
> 'das Auto'
> >>> n.group(2)
> 'Auto'
>
> I'm using Python2.0 on a Debian potato system.
> Why didn't m.group(2) produce 'Tür' as the output???
Because 'die' is all the entire expression matched. I think you're probably
misunderstanding how regexps group: a|b|cd acts like (a)|(b)|(cd), not like
(a|b|c)d. Try this instead:
p = re.compile('^((?:der|die|das)(\s\w+))')
You won't get None anymore, but whether or not \w+ matches more than T may
depend on your locale setting.
> Python2.0 is supposed to have Unicode support buil-in the
> re module right?
Yes, but these are no Unicode strings in your program.
> Other than the fact that 'Tür' has the 'ü' unicode
> charcater, I fail to see any difference!
Heh. Leaving this joy to someone else <wink>.
More information about the Python-list
mailing list