[Python-Dev] Multilingual programming article on the Red Hat Developer blog
ja.py at farowl.co.uk
Fri Sep 12 08:54:56 CEST 2014
On 12/09/2014 04:28, Stephen J. Turnbull wrote:
> Jeff Allen writes:
> > A welcome article. One correction should be made, I believe: the area of
> > code point space used for the smuggling of bytes under PEP-383 is not a
> > "Unicode Private Use Area", but a portion of the trailing surrogate
> > range.
> Nice catch. Note that the surrogate range was originally part of the
> Private Use Area, but it was carved out with the adoption of UTF-16 in
> about 1993. In practice, I doubt that there are any current
> implementations claiming compatibility with Unicode 1.0 (IIRC, UTF-16
> was made mandatory in Unicode 1.1).
That's a helpful bit of history that explains the uncharacteristic
inaccuracy. Most I can do to keep the current position clear in my head.
> I've always thought that the "right" way to handle the private use
> area for "platforms" like Python and Emacs, which may need to use it
> for their own purposes (such as "undecodable bytes") but want to
> respect its use by applications, is to create an auxiliary table
> mapping the private use area to objects describing the characters
> represented by the private use code points. These objects would have
> attributes such as external representation for text I/O, glyph (for
> GUI display), repr (for TTY display), various Unicode properties, etc.
Simply having a block "for private use" seems to create an unmanaged
space for conflict, reminiscent of the "other 128 characters" in
bilingual programming. I wondered if the way to respect use by
applications might be to make it private to a particular sub-class of
str, idly however.
More information about the Python-Dev