[Tutor] why is unichr(sys.maxunicode) blank?

Sun May 19 03:34:44 CEST 2013

On 19/05/13 02:45, Albert-Jan Roskam wrote about locales:

> It is pretty sick that all these things can be adjusted separately (what is the use of having: danish collation, russian case conversion, english decimal sign, japanese codepage ;-)

Well obviously there is no point to such a mess, but the ability to make a mess comes from having the flexibility to have less silly combinations.

By the way, I'm not sure what you mean by "pretty sick", since in Australian slang "sick" can mean "fantastic, excellent", as in "Mate, that's a pretty sick sub-woofer!".

See http://www.youtube.com/watch?v=iRv7IE6T4gQ

(warning: ethnic stereotypes, low-brow humour)

[...]
>>>   Isn't UCS-2 the internal unicode encoding for CPython (narrow builds)?
>>
>> Narrow builds create UTF-16 surrogate pairs from \U literals, but
>> these aren't treated as an atomic unit for slicing, iteration, or
>> string length.
>
> That is a nice way of putting it. So if you slice a multibyte char "mb", mb[0] will return the first byte? That is annoying.

Correct. You can easily break apart surrogate pairs in Python narrow builds, which leads to invalid strings. The solution is to either use a wide build, or upgrade to Python 3.3 which no longer has this problem:

# Python 3.2, narrow build:
py> len(chr(0x101001))
2

# Python 3.3
py> len(chr(0x101001))
1

-- 
Steven