[Python-ideas] PEP 540: Add a new UTF-8 mode

Steven D'Aprano steve at pearwood.info
Thu Jan 5 23:49:22 EST 2017


On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote:

> Let's say that you have the filename b'nonascii\xff': it's decoded as
> 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename?
> (I don't know the answer, it's a real question ;-))

I ran this in Python 2.7 to create the file:

open(b'/tmp/nonascii\xff-', 'w')

and then confirmed the filename:

[steve at ando tmp]$ ls -b nonascii*
nonascii\377-

Konquorer in KDE 3 displays it with *two* "missing character" glyphs 
(small hollow boxes) before the hyphen. The KDE "Open File" dialog box 
shows the file with two blank spaces before the hyphen.

My interpretation of this is that the difference is due to using 
different fonts: the file name is shown the same way, but in one font 
the missing character is a small box and in the other it is a blank 
space.

I cannot tell what KDE is using for the invalid character, if I copy it 
as text and paste it into a file I just get the original \xFF.

The Geany text editor, which I think uses the same GUI toolkit as Gnome, 
shows the file with a single "missing glyph" character, this time a 
black diamond with a question mark in it.

It looks like Geany (Gnome?) is displaying the invalid byte as U+FFFD, 
the Unicode "REPLACEMENT CHARACTER".

So at least two Linux GUI environments are capable of dealing with 
filenames that are invalid UTF-8, in two different ways.

Does this answer your question about GUIs?


-- 
Steve


More information about the Python-ideas mailing list