No explanation for weird behavior in re module!
Jason Orendorff
jason at jorendorff.com
Mon Feb 11 00:16:08 EST 2002
synthespian writes:
> The problem is that I can't make Python read anything with
> non-ASCII character set.
import re
import codecs
pattern = re.compile(ur'^(der|die|das)\s+(\w+)', re.UNICODE)
f = codecs.open('article.txt', 'r', 'iso-8859-1')
lines = f.readlines()
f.close()
f = codecs.open('article.out.txt', 'w', 'iso-8859-1')
for line in lines:
match = pattern.match(line)
article = match.group(1)
noun = match.group(2)
f.write(u"article: %s ... noun: %s\n" % (article, noun))
f.close()
I've got Python 2.2, but I think it should work for you too.
## Jason Orendorff http://www.jorendorff.com/
More information about the Python-list
mailing list