[Python-ideas] Processing surrogates in

Serhiy Storchaka storchaka at gmail.com
Fri May 8 13:54:37 CEST 2015


On 05.05.15 11:23, Stephen J. Turnbull wrote:
> Serhiy Storchaka writes:
>
>   > Use cases include programs that use tkinter (common build of Tcl/Tk
>   > don't accept non-BMP characters), email or wsgiref.
>
> So, consider Tcl/Tk.  If you use it for input, no problem, it *can't*
> produce non-BMP characters.  So you're using it for output.  If
> knowing that your design involves tkinter, you deduce you must not
> accept non-BMP characters on input, where's your problem?

With Tcl/Tk all is not so easy. The main issue is with translating from 
Tcl to Python. Tcl uses at least two representations for strings (UCS-2 
and modified UTF-8, and Latin1 in some cases), both can contain invalid 
codes and implicit conversion from one to other is lossy. Currently 
there is a way to crash IDLE (and may be other Tkinter applications) by 
just pasting mailformed data from clipboard. I don't think that my 
proposal will help Tkinter a lot, but there are requests for such 
features, and perhaps these functions could help to solve or workaround 
at least some of Tkinter issues.

> And ... you looked twice at your proposal?  You have basically
> reproduced the codec error handling API for .decode and .encode in a
> bunch to str2str "rehandle" functions.

Yes, this is the main advantage of proposed functions. They reuse 
existing error handlers and are extensible by writing new error handlers.

>  In other words, you need to
> know as much to use "rehandle_*" properly as you do to use .decode and
> .encode.  I do not see a win for the programmer who is mostly innocent
> of encoding knowledge.

Is it a problem? These functions are for experienced users. Perhaps 
mostly for authors of libraries and frameworks.

> If we apply these rehandle_* thumbs to the holes in the I18N dike,
> it's just going to spring more leaks elsewhere.

There are a lot of butteries included in Python. They can explode if use 
them incorrectly.

Sorry, I don't understand your frustration.




More information about the Python-ideas mailing list