[Python-Dev] a suggestion ... Re: PEP 383 (again)

glyph at divmod.com glyph at divmod.com
Thu Apr 30 18:26:25 CEST 2009


On 03:35 pm, martin at v.loewis.de wrote:
>>So, why do you prefer half surrogate coding to U+0000 quoting?
>
>If I pass a string with an embedded U+0000 to gtk, gtk will truncate
>the string, and stop rendering it at this character. This is worse than
>what it does for invalid UTF-8 sequences. Chances are fairly high that
>other C libraries will fail in the same way, in particular if they
>expect char* (which is very common in C).

Hmm.  I believe the intended failure mode here, for PyGTK at least, is 
actually this:

    TypeError: GtkLabel.set_text() argument 1 must be string without null 
bytes, not unicode

APIs in PyGTK which accept NULLs and silently trucate are probably 
broken.  Although perhaps I've just made your point even more strongly; 
one because the behavior is inconsistent, and two because it sometimes 
raises an exception if a NULL is present, and apparently the goal here 
is to prevent exceptions from being raised anywhere in the process.

For this idiom to be of any use to GTK programs, 
gtk.FileChooser.get_filename() will probably need to be changed, since 
(in py2) it currently returns a str, not unicode.

The PEP should say something about how GUI libraries should handle file 
choosers, so that they'll be consistent and compatible with the standard 
library.  Perhaps only that file choosers need to take this PEP into 
account, and the rest is obvious.  Or maybe the right thing for GTK to 
do would be to continue to use bytes on POSIX and convert to text on 
Windows, since open(), listdir() et. al. will continue to accept bytes 
for filenames?
>So I prefer the half surrogate because its failure mode is better th

Heh heh heh.


More information about the Python-Dev mailing list