Python under PowerShell adds characters
Marko Rauhamaa
marko at pacujo.net
Thu Mar 30 01:43:46 EDT 2017
Steven D'Aprano <steve at pearwood.info>:
> On Thu, 30 Mar 2017 07:29:48 +0300, Marko Rauhamaa wrote:
>> I'd expect not having to deal with Unicode decoding exceptions with
>> arbitrary input.
>
> That's just silly. If you have *arbitrary* bytes, not all
> byte-sequences are valid Unicode, so you have to expect decoding
> exceptions, if you're processing text.
The input is not in my control, and bailing out may not be an option:
$ echo $'aa\n\xdd\naa' | grep aa
aa
aa
$ echo $'\xdd' | python2 -c 'import sys; sys.stdin.read(1)'
$ echo $'\xdd' | python3 -c 'import sys; sys.stdin.read(1)'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib64/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 0:
invalid continuation byte
Note that "grep" is also locale-aware.
>> There recently was a related debate on the Guile mailing list. Like
>> Python3, Guile2 is sensitive to illegal UTF-8 on the command line and
>> in the standard streams. An emacs developer was urging Guile
>> developers to follow emacs's example and support a superset of UTF-8
>> and Unicode where all byte strings can be bijectively mapped into
>> text.
>
> I'd like to read that. Got a link?
<URL:
http://lists.gnu.org/archive/html/guile-user/2017-02/msg00054.html>
Marko
More information about the Python-list
mailing list