[Python-ideas] PEP 540: Add a new UTF-8 mode
Steven D'Aprano
steve at pearwood.info
Thu Jan 5 23:49:22 EST 2017
On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote:
> Let's say that you have the filename b'nonascii\xff': it's decoded as
> 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename?
> (I don't know the answer, it's a real question ;-))
I ran this in Python 2.7 to create the file:
open(b'/tmp/nonascii\xff-', 'w')
and then confirmed the filename:
[steve at ando tmp]$ ls -b nonascii*
nonascii\377-
Konquorer in KDE 3 displays it with *two* "missing character" glyphs
(small hollow boxes) before the hyphen. The KDE "Open File" dialog box
shows the file with two blank spaces before the hyphen.
My interpretation of this is that the difference is due to using
different fonts: the file name is shown the same way, but in one font
the missing character is a small box and in the other it is a blank
space.
I cannot tell what KDE is using for the invalid character, if I copy it
as text and paste it into a file I just get the original \xFF.
The Geany text editor, which I think uses the same GUI toolkit as Gnome,
shows the file with a single "missing glyph" character, this time a
black diamond with a question mark in it.
It looks like Geany (Gnome?) is displaying the invalid byte as U+FFFD,
the Unicode "REPLACEMENT CHARACTER".
So at least two Linux GUI environments are capable of dealing with
filenames that are invalid UTF-8, in two different ways.
Does this answer your question about GUIs?
--
Steve
More information about the Python-ideas
mailing list