regular expression, unicode

Simon Strobl Simon.Strobl at
Wed Apr 29 16:01:01 CEST 2009


why can't I use this statement in python3:

good = re.compile("^[A-ZÄÖÜ].*")

According to the documentation, patterns can be unicode strings.

I get this error message:

Traceback (most recent call last):
  File "./", line 8, in <module>
    for line in sys.stdin:
  File "/usr/lib64/python3.0/", line 1734, in __next__
    line = self.readline()
  File "/usr/lib64/python3.0/", line 1808, in readline
    while self._read_chunk():
  File "/usr/lib64/python3.0/", line 1557, in _read_chunk
    self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
  File "/usr/lib64/python3.0/", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-3:
invalid data


More information about the Python-list mailing list