[Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

Tue Sep 30 16:11:02 CEST 2008

Le Tuesday 30 September 2008 15:53:09 Guido van Rossum, vous avez écrit :
> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <martin at v.loewis.de> 
wrote:
> >> Change the default file system encoding to store bytes in Unicode is
> >> like introducing a new Python type: <fake Unicode for filename hacks>.
> >
> > Exactly. Seems like the best solution to me, despite your polemics.
>
> Martin, I don't understand why you are in favor of storing raw bytes
> encoded as Latin-1 in Unicode string objects, which clearly gives rise
> to mojibake. In the past you have always been staunchly opposed to API
> changes or practices that could lead to mojibake (and you had me quite
> convinced).

If I understood correctly, the goal of Python3 is the clear *separation* of 
bytes and characters. Store bytes in Unicode is pratical because it doesn't 
need to change the existing code, but it doesn't fix the problem, it's just 
move problems which be raised later.

I didn't get an answer to my question: what is the result <bytes (fake 
characters) stored in unicode> + <real unicode>? I guess that the result is 
<mixed "bytes" and characters in unicode> instead of raising an error 
(invalid types). So again: why introducing a new type instead of reusing 
existing Python types?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/