On 26 May 2015 at 19:30, anatoly techtonik
In real world you have to deal with broken and invalid output and UnicodeDecode crashes is not an option. The unicode() constructor proposes two options to deal with invalid output:
1. ignore - meaning skip and corrupt the data 2. replace - just corrupt the data
There are other error handlers, specifically surrogateescape is designed for this use. Only in Python 3.x admittedly, but this list is about future versions of Python, so that's what matters here.
The solution is to have filter preprocess the binary string to escape all non-unicode symbols so that the following lossless transformation becomes possible:
binary -> escaped utf-8 string -> unicode -> binary
How to accomplish that with Python 2.x?
That question is for python-list. Language changes will only be made to 3.x - python-ideas isn't appropriate for questions about how to achieve something in 2.x. Paul