[Python-Dev] UCS2/UCS4 default
Jeroen Ruigrok van der Werven
asmodai at in-nomine.org
Thu Jul 3 18:51:40 CEST 2008
-On [20080703 17:03], Guido van Rossum (guido at python.org) wrote:
>I don't see an answer there to the question of whether the length()
>method of a Java String object containing a single surrogate pair
>returns 1 or 2; I suspect it returns 2.
As
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/CharSequence.html#length()
states:
int length()
Returns the length of this character sequence. The length is the number of
16-bit chars in the sequence.
But since Java switched to full UTF-16 support in 1.5.0 they extended their
API since the existing methods have probably come too ingrained.
E.g. codePointCount()
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#codePointCount(char[],%20int,%20int)
>The one thing that may be missing from Python is things like
>interpretation of surrogates by functions like isalpha() and I'm okay
>with adding that (since those have to loop over the entire string
>anyway).
Those would be welcome already, yes. I'll see if I can help out.
--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Fallen into ever-mourn, with these wings so torn, after your day my dawn...
More information about the Python-Dev
mailing list