PEP 393 vs UTF-8 Everywhere
marko at pacujo.net
Sat Jan 21 14:52:42 EST 2017
Pete Forman <petef4+usenet at gmail.com>:
> Surrogates only exist in UTF-16. They are expressly forbidden in UTF-8
> and UTF-32.
Also, they don't exist as Unicode code points. Python shouldn't allow
surrogate characters in strings.
Thus the range of code points that are available for use as
characters is U+0000–U+D7FF and U+E000–U+10FFFF (1,112,064 code
The Unicode Character Database is basically a table of characters
indexed using integers called ’code points’. Valid code points are in
the ranges 0 to #xD7FF inclusive or #xE000 to #x10FFFF inclusive,
which is about 1.1 million code points.
Guile does the right thing:
$1 = #\153777
$2 = #\160000
While reading expression:
ERROR: In procedure scm_lreadr: #<unknown port>:5:8: out-of-range hex c
haracter escape: xd812
> py> low = '\uDC37'
That should raise a SyntaxError exception.
More information about the Python-list