[Ironpython-users] Unicode

Markus Schaber m.schaber at codesys.com
Wed Sep 17 09:10:12 CEST 2014


Hi,

Von: Pawel Jasinski
> On Tue, Sep 16, 2014 at 9:52 PM, Vernon D. Cole <vernondcole at gmail.com> wrote:
> > Just out of curiosity, how does it work in CPython 3.4?

> As expected I guess:
> Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> u"\U0001D4AE"
> '𝒮'
> I even get correct character on my terminal. Wow!

On my windows machine, I just get the \U sequence back, but at least, the character is not lost.

> >>> import sys
> >>> sys.maxunicode
> 1114111
> This means python I have is compiled with UCS4
> Looks like .net has surrogates and no direct UCS4 support. I guess we don't have many alternatives.

While Python 2 existed in 16- and 32-bit Unicode flavours, Python 3 was UCS-4 from the beginning. All strings are full unicode.

(In Python 3.3, they included an optimization to use 8 or 16 bit for strings which do not actually use higher code points, but this is purely an internal optimization: https://docs.python.org/3.4/whatsnew/3.3.html#pep-393)

As .NET is inherently UTF-16, we won't be fully compatible with Python strings wr/t direct indexing of codepoints (emulating it is possible, but will be incredibly slow). But at least, we should try to not lose characters.
 
> > (That's where we should be headed. All of this fiddling with obscure str bugs in 2.7 is a bit of a waste, IMHO.)
> It saves effort when working with generic python packages. Any fix of "obscure" bytes conversion pays back.

I think that fixing this by using two surrogates is the closest behaviour we can get to both Python 2 and Python 3.


Best regards

Markus Schaber

-- 
CODESYS® a trademark of 3S-Smart Software Solutions GmbH 

Inspiring Automation Solutions 
________________________________________
3S-Smart Software Solutions GmbH 
Dipl.-Inf. Markus Schaber | Product Development Core Technology 
Memminger Str. 151 | 87439 Kempten | Germany 
Tel. +49-831-54031-979 | Fax +49-831-54031-50 

E-Mail: m.schaber at codesys.com | Web: codesys.com | CODESYS store: store.codesys.com 
CODESYS forum: forum.codesys.com 

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 
________________________________________
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received 
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure 
or distribution of the material in this e-mail is strictly forbidden. 


More information about the Ironpython-users mailing list