No explanation for weird behavior in re module!

Sun Feb 10 19:21:22 EST 2002

[synthespian]
> >>> import re
> >>> p = re.compile('^(der|die|das(\s\w+))')
> >>> m = p.match('die Tür, Türen')
> >>> n = p.match('das Auto, Autos')
> >>> m.group(0)
> 'die'
> >>> m.group(1)
> 'die'
> >>> m.group(2)
> [nothing!!!!]
> >>> n.group(0)
> 'das Auto'
> >>> n.group(1)
> 'das Auto'
> >>> n.group(2)
> 'Auto'
>
> 	I'm using Python2.0 on a Debian potato system.
> 	Why didn't m.group(2) produce 'Tür' as the output???

Because 'die' is all the entire expression matched.  I think you're probably
misunderstanding how regexps group:  a|b|cd acts like (a)|(b)|(cd), not like
(a|b|c)d.  Try this instead:

p = re.compile('^((?:der|die|das)(\s\w+))')

You won't get None anymore, but whether or not \w+ matches more than T may
depend on your locale setting.

> 	Python2.0 is supposed to have Unicode support buil-in the
> re module right?

Yes, but these are no Unicode strings in your program.

> 	Other than the fact that 'Tür' has the 'ü' unicode
> charcater, I fail to see any difference!

Heh.  Leaving this joy to someone else <wink>.