Newbie question about text encoding
Chris Angelico
rosuav at gmail.com
Sat Mar 7 11:53:09 EST 2015
On Sun, Mar 8, 2015 at 3:40 AM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>> Here's an example:
>>
>> b = b'\x80'
>>
>> Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping
>> from str objects to bytes objects.
>>
>
> Python 2 might, Python 3 doesn't.
He was talking about this line of code:
b.decode('utf-8').encode('utf-8') == b
With the above assignment, that does indeed throw an error - which is
correct behaviour.
Challenge: Figure out a byte-string input that will make this function
return True.
def is_utf8_broken(b):
return b.decode('utf-8').encode('utf-8') != b
Correct responses for this function are either False or raising an exception.
ChrisA
More information about the Python-list
mailing list