Python under PowerShell adds characters
Marko Rauhamaa
marko at pacujo.net
Thu Mar 30 01:57:00 EDT 2017
Chris Angelico <rosuav at gmail.com>:
> On Thu, Mar 30, 2017 at 4:43 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
>> The input is not in my control, and bailing out may not be an option:
>>
>> $ echo
>> aa\n\xdd\naa' | grep aa
>> aa
>> aa
>> $ echo \xdd' | python2 -c 'import sys; sys.stdin.read(1)'
>> $ echo \xdd' | python3 -c 'import sys; sys.stdin.read(1)'
>> Traceback (most recent call last):
>> File "<string>", line 1, in <module>
>> File "/usr/lib64/python3.5/codecs.py", line 321, in decode
>> (result, consumed) = self._buffer_decode(data, self.errors, final)
>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 0:
>> invalid continuation byte
>>
>> Note that "grep" is also locale-aware.
>
> So what exactly does byte value 0xDD mean in your stream?
>
> And if you say "it doesn't matter", then why are you assigning meaning
> to byte value 0x0A in your first example? Truly binary data doesn't
> give any meaning to 0x0A.
What I'm saying is that every program must behave in a minimally
controlled manner regardless of its inputs (which are not in its
control). With UTF-8, it is dangerously easy to write programs that
explode surprisingly. What's more, resyncing after such exceptions is
not at all easy. I would venture to guess that few Python programs even
try to do that.
Marko
More information about the Python-list
mailing list