[Tutor] German Umlaut
Nicole Seitz
nicole.seitz@urz.uni-hd.de
Thu, 4 Apr 2002 19:56:16 +0200
Am Montag, 1. April 2002 22:29 schrieben Sie:
> Hmmm... Is it possible to relax the regular expression a little? Instead
> of forcing a "capitalized" word, would this be feasible:
>
> ###
> reg = re.compile(r"\b\w+-? und \w+\b",re.UNICODE)
> ###
As German nouns have to be "capitalized" words, I can't use this regex.
>
>
> Otherwise, we can stuff in 'string.uppercase' in the regular expression.
> Here's one way to do it with some string formatting:
>
> ###
> reg = re.compile(r"\b[%s]\w+-? und [%s]\w+\b"
> % (string.uppercase, string.uppercase), re.UNICODE)
> ###
That's better! Here's my latest regex:
reg = re.compile(r"\b[%s]\w+-?(?:,\s\b[%s]\w+)*\sund\s\b[%s]\w+"
%(string.uppercase,string.uppercase,string.uppercase),re.UNICODE)
It works (matches also "Berlin, London,Wien und Paris"),
but I don't like it because of
%(string.uppercase,string.uppercase,string.uppercase).
The former version looks better:
reg3 = re.compile(r"(?:[%s\w-]+(?:,|\sund)?\s)+"
%(string.uppercase),
re.UNICODE)
Unfortunately, it finds pretty more things than I wanted it to find.
Many greetings,
Nicole