regular expression, unicode
MRAB
google at mrabarnett.plus.com
Wed Apr 29 19:39:12 EDT 2009
Simon Strobl wrote:
> Hello,
>
> why can't I use this pattern
>
> good = re.compile("^[A-ZÄÖÜ].*")
>
> in python3. According to the documentation, patterns may be unicode
> strings.
>
> I get this error message:
>
> Traceback (most recent call last):
> File "./get.py", line 8, in <module>
> for line in sys.stdin:
> File "/usr/lib64/python3.0/io.py", line 1734, in __next__
> line = self.readline()
> File "/usr/lib64/python3.0/io.py", line 1808, in readline
> while self._read_chunk():
> File "/usr/lib64/python3.0/io.py", line 1557, in _read_chunk
> self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
> File "/usr/lib64/python3.0/codecs.py", line 300, in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
> invalid data
>
In Python 3 .py files are assumed to be encoded in UTF-8 unless declared
otherwise by a line such as:
# -*- coding: cp-1252 -*-
You need to check what encoding your editor is using (if possible use
UTF-8).
More information about the Python-list
mailing list