[Python-3000] New proposition for Python3 bytes filename issue

Tue Sep 30 16:04:09 CEST 2008

On Tue, Sep 30, 2008 at 2:28 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>>
>> The only way to display that file would be to transform it into some
>> other valid unicode string.  However, as that string is already valid,
>> you've just made any files named after it impossible to open.
>
> Not if those valid sequences are also properly escaped to avoid collisions.
> That's what utf-8b claims to do.
>
> My view of utf-8b is that if is not really  a new codec, but an escaping phase
> added in front of utf-8, such that illegal byte sequences get converted to legal
> byte sequences. This is how e.g. XML-escaping works ("&" -> "&amp;", etc.). The
> only difficulty being in choosing sufficiently rare escaping sequences, so that
> readability is not impacted.

The problem is that there's no way (at least nobody has proposed one
AFAICT) to tell whether the escaping has been applied. When reading
XML, you *know* that you are expected to unescape exactly one level of
& escaping. You would never find XML with the unescaping already done
for you. But the output of utf-8b is indistinguishable from regular
utf-8 so you don't know whether you need to unescape things.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)