Le Tuesday 30 September 2008 15:53:09 Guido van Rossum, vous avez écrit :
On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: <fake Unicode for filename hacks>.
Exactly. Seems like the best solution to me, despite your polemics.
Martin, I don't understand why you are in favor of storing raw bytes encoded as Latin-1 in Unicode string objects, which clearly gives rise to mojibake. In the past you have always been staunchly opposed to API changes or practices that could lead to mojibake (and you had me quite convinced).
If I understood correctly, the goal of Python3 is the clear *separation* of bytes and characters. Store bytes in Unicode is pratical because it doesn't need to change the existing code, but it doesn't fix the problem, it's just move problems which be raised later. I didn't get an answer to my question: what is the result <bytes (fake characters) stored in unicode> + <real unicode>? I guess that the result is <mixed "bytes" and characters in unicode> instead of raising an error (invalid types). So again: why introducing a new type instead of reusing existing Python types? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/