[Python-ideas] Processing surrogates in
Serhiy Storchaka
storchaka at gmail.com
Fri May 8 13:54:37 CEST 2015
On 05.05.15 11:23, Stephen J. Turnbull wrote:
> Serhiy Storchaka writes:
>
> > Use cases include programs that use tkinter (common build of Tcl/Tk
> > don't accept non-BMP characters), email or wsgiref.
>
> So, consider Tcl/Tk. If you use it for input, no problem, it *can't*
> produce non-BMP characters. So you're using it for output. If
> knowing that your design involves tkinter, you deduce you must not
> accept non-BMP characters on input, where's your problem?
With Tcl/Tk all is not so easy. The main issue is with translating from
Tcl to Python. Tcl uses at least two representations for strings (UCS-2
and modified UTF-8, and Latin1 in some cases), both can contain invalid
codes and implicit conversion from one to other is lossy. Currently
there is a way to crash IDLE (and may be other Tkinter applications) by
just pasting mailformed data from clipboard. I don't think that my
proposal will help Tkinter a lot, but there are requests for such
features, and perhaps these functions could help to solve or workaround
at least some of Tkinter issues.
> And ... you looked twice at your proposal? You have basically
> reproduced the codec error handling API for .decode and .encode in a
> bunch to str2str "rehandle" functions.
Yes, this is the main advantage of proposed functions. They reuse
existing error handlers and are extensible by writing new error handlers.
> In other words, you need to
> know as much to use "rehandle_*" properly as you do to use .decode and
> .encode. I do not see a win for the programmer who is mostly innocent
> of encoding knowledge.
Is it a problem? These functions are for experienced users. Perhaps
mostly for authors of libraries and frameworks.
> If we apply these rehandle_* thumbs to the holes in the I18N dike,
> it's just going to spring more leaks elsewhere.
There are a lot of butteries included in Python. They can explode if use
them incorrectly.
Sorry, I don't understand your frustration.
More information about the Python-ideas
mailing list