No explanation for weird behavior in re module!

synthespian synthespian at uol.com.br
Sun Feb 10 19:42:32 EST 2002


Hi-

	I'm really intrigued by this behavior:

>>> import re
>>> p = re.compile('^(der|die|das(\s\w+))')
>>> m = p.match('die Tür, Türen')
>>> n = p.match('das Auto, Autos')
>>> m.group(0)
'die'
>>> m.group(1)
'die'
>>> m.group(2)
[nothing!!!!]
>>> n.group(0)
'das Auto'
>>> n.group(1)
'das Auto'
>>> n.group(2)
'Auto'

	I'm using Python2.0 on a Debian potato system. 
	Why didn't m.group(2) produce 'Tür' as the output???
	Python2.0 is supposed to have Unicode support buil-in the re module right?
	Other than the fact that 'Tür' has the 'ü' unicode charcater, I fail to see any difference!
	I've even tried "import sre", but that didn't do it either...It's too bad this isn't working, because it's a better way to work with regexx than Perl...
	What's going on here? Am I the problem here, not knowing how to make Python understand the umlaut
character (the 'ü')? Or is it a * bug *??!!!!!

	Please help!
	TIA,
	best regards,

	H






More information about the Python-list mailing list