Python under PowerShell adds characters

Marko Rauhamaa marko at
Thu Mar 30 01:57:00 EDT 2017

Chris Angelico <rosuav at>:

> On Thu, Mar 30, 2017 at 4:43 PM, Marko Rauhamaa <marko at> wrote:
>> The input is not in my control, and bailing out may not be an option:
>>    $ echo
>> aa\n\xdd\naa' | grep aa
>>    aa
>>    aa
>>    $ echo \xdd' | python2 -c 'import sys;'
>>    $ echo \xdd' | python3 -c 'import sys;'
>>    Traceback (most recent call last):
>>      File "<string>", line 1, in <module>
>>      File "/usr/lib64/python3.5/", line 321, in decode
>>        (result, consumed) = self._buffer_decode(data, self.errors, final)
>>    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 0:
>>     invalid continuation byte
>> Note that "grep" is also locale-aware.
> So what exactly does byte value 0xDD mean in your stream?
> And if you say "it doesn't matter", then why are you assigning meaning
> to byte value 0x0A in your first example? Truly binary data doesn't
> give any meaning to 0x0A.

What I'm saying is that every program must behave in a minimally
controlled manner regardless of its inputs (which are not in its
control). With UTF-8, it is dangerously easy to write programs that
explode surprisingly. What's more, resyncing after such exceptions is
not at all easy. I would venture to guess that few Python programs even
try to do that.


More information about the Python-list mailing list