[Python-Dev] a suggestion ... Re: PEP 383 (again)
glyph at divmod.com
glyph at divmod.com
Thu Apr 30 18:26:25 CEST 2009
On 03:35 pm, martin at v.loewis.de wrote:
>>So, why do you prefer half surrogate coding to U+0000 quoting?
>
>If I pass a string with an embedded U+0000 to gtk, gtk will truncate
>the string, and stop rendering it at this character. This is worse than
>what it does for invalid UTF-8 sequences. Chances are fairly high that
>other C libraries will fail in the same way, in particular if they
>expect char* (which is very common in C).
Hmm. I believe the intended failure mode here, for PyGTK at least, is
actually this:
TypeError: GtkLabel.set_text() argument 1 must be string without null
bytes, not unicode
APIs in PyGTK which accept NULLs and silently trucate are probably
broken. Although perhaps I've just made your point even more strongly;
one because the behavior is inconsistent, and two because it sometimes
raises an exception if a NULL is present, and apparently the goal here
is to prevent exceptions from being raised anywhere in the process.
For this idiom to be of any use to GTK programs,
gtk.FileChooser.get_filename() will probably need to be changed, since
(in py2) it currently returns a str, not unicode.
The PEP should say something about how GUI libraries should handle file
choosers, so that they'll be consistent and compatible with the standard
library. Perhaps only that file choosers need to take this PEP into
account, and the rest is obvious. Or maybe the right thing for GTK to
do would be to continue to use bytes on POSIX and convert to text on
Windows, since open(), listdir() et. al. will continue to accept bytes
for filenames?
>So I prefer the half surrogate because its failure mode is better th
Heh heh heh.
More information about the Python-Dev
mailing list