[Tutor] why is unichr(sys.maxunicode) blank?

Dave Angel davea at davea.name
Sat May 18 05:06:23 CEST 2013


On 05/17/2013 08:46 PM, Steven D'Aprano wrote:
> On 18/05/13 05:23, Albert-Jan Roskam wrote:
>> Hi,
>>
>> I was curious what the "high" four-byte ut8 unicode characters look like.
>
> What typeface are you using to print them? Most type faces ("fonts")
> only support a tiny portion of the Unicode range. For that matter, most
> of the Unicode range is currently unused.
>
>
>> Why does the snippet below not print anything (well, it will
>> eventually, I think, but at that point I have lost my patience
>> already). Puh-lease tell me there are no such things as Mongolian,
>> Chinese backspaces and other nonprintable characters. ;-)
>
> Chinese backspaces? Probably not. But there may well be unprintable
> characters for all sorts of reasons:
>
> - the font you are using simply doesn't have a glyph for the code-point;
>
> - the code-point is not yet assigned, so there's nothing to show;
>
> - the code-point represents an invisible character, like IDEOGRAPHIC
> SPACE, or a zero-width character, like MONGOLIAN VOWEL SEPARATOR;
>
> - or it represents a non-printing control character;
>
> - or a formatting mark like LEFT-TO-RIGHT EMBEDDING;
>
> - or it is one of the sixty-six guaranteed "non-characters", such as the
> infamous byte-order marks U+FFFE and U+FFFF.
>
>
>
>

One tool that can help is the name function in module unicodedata


  >>> import unicodedata
  >>> unicodedata.name(u'\xb0')
'DEGREE SIGN'

If you try that on the values near sys.maxunicode you get an exception:
ValueError: no such name


For example:

 >>> unicodedata.name(unichr(sys.maxunicode - 1))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: no such name


-- 
DaveA


More information about the Tutor mailing list