[Tutor] German Umlaut

Nicole Seitz nicole.seitz@urz.uni-hd.de
Thu, 4 Apr 2002 19:56:16 +0200


Am Montag,  1. April 2002 22:29 schrieben Sie:

> Hmmm... Is it possible to relax the regular expression a little?  Instead
> of forcing a "capitalized" word, would this be feasible:
>
> ###
> reg = re.compile(r"\b\w+-? und \w+\b",re.UNICODE)
> ###

As German nouns have to be "capitalized" words, I can't use this regex.

>
>
> Otherwise, we can stuff in 'string.uppercase' in the regular expression.
> Here's one way to do it with some string formatting:
>
> ###
> reg = re.compile(r"\b[%s]\w+-? und [%s]\w+\b"
>                   % (string.uppercase, string.uppercase), re.UNICODE)
> ###

That's better! Here's my latest regex:

reg = re.compile(r"\b[%s]\w+-?(?:,\s\b[%s]\w+)*\sund\s\b[%s]\w+"
                 
%(string.uppercase,string.uppercase,string.uppercase),re.UNICODE)


It works (matches also "Berlin, London,Wien  und Paris"),
but I don't like it because of  
%(string.uppercase,string.uppercase,string.uppercase).
The former version looks better:

reg3 = re.compile(r"(?:[%s\w-]+(?:,|\sund)?\s)+"
                  %(string.uppercase),
                  re.UNICODE)

Unfortunately, it finds pretty more things than I wanted it to find.


Many greetings,

Nicole