regular expression, unicode

Wed Apr 29 19:30:59 EDT 2009

On Wed, 29 Apr 2009 12:44:12 +0100, Simon Strobl <Simon.Strobl at gmail.com>  
wrote:

> why can't I use this pattern
>
> good = re.compile("^[A-ZÄÖÜ].*")
>
> in python3. According to the documentation, patterns may be unicode
> strings.
>
> I get this error message:
>
> Traceback (most recent call last):
>   File "./get.py", line 8, in <module>
>     for line in sys.stdin:
>   File "/usr/lib64/python3.0/io.py", line 1734, in __next__
>     line = self.readline()
>   File "/usr/lib64/python3.0/io.py", line 1808, in readline
>     while self._read_chunk():
>   File "/usr/lib64/python3.0/io.py", line 1557, in _read_chunk
>     self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
>   File "/usr/lib64/python3.0/codecs.py", line 300, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
> invalid data

What on earth makes you think this is anything to do with the
regular expression?  It looks more like it's complaining about
what you've typed in at the console.

-- 
Rhodri James *-* Wildebeeste Herder to the Masses