Python's handling of unicode surrogates

Ross Ridge rridge at caffeine.csclub.uwaterloo.ca
Mon Apr 23 13:25:55 EDT 2007


Ross Ridge writes:
> The Unicode standard doesn't require that you support surrogates, or
> any other kind of character, so no you wouldn't be lying.

<martin at v.loewis.de> wrote:
> There is the notion of Unicode implementation levels, and each of them
> does include a set of characters to support. 

There are different levels of implemtentation for ISO 10646, but not
of Unicode.

> It is probably an interpretation issue what "supported" means.

The strongest claim to support Unicode that you can meaningfully make
is that of conformance to the Unicode standard.  The Unicode standard's
conformance requirements make it explicit that you don't need to support
any particular character:

	C8 A process shall not assume that it is required to interpret
	   any particular coded character representation.

	  . Processes that interpret only a subset of Unicode characters
	    are allowed; there is no blanket requirement to interpret
	    all Unicode characters.
	  [...]

> Python clearly supports Unicode level 1 (if we leave alone the issue
> that it can't render all these characters out of the box, as it doesn't
> ship any fonts);

It's not at all clear to to me that Python does support ISO 10646's
implementation level 1, if only because I don't, and I assume you don't,
have a copy of ISO 10646 available to verify what the requirements
actually are.

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge at csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  



More information about the Python-list mailing list