[XML-SIG] Re: Issues with Unicode type
Daniel Veillard
veillard@redhat.com
Mon, 23 Sep 2002 16:35:22 -0400
On Mon, Sep 23, 2002 at 07:21:41PM +0200, Eric van der Vlist wrote:
> Yep, and that's what James Clark is doing in his Java implementation:
>
> public int getLength(Object obj) {
> String str = (String)obj;
> int len = str.length();
> int nSurrogatePairs = 0;
> for (int i = 0; i < len; i++)
> if (Utf16.isSurrogate1(str.charAt(i)))
> nSurrogatePairs++;
> return len - nSurrogatePairs;
> }
>
> And I need to do the same in Python...
yep, that simple,
> > Notice also that U+10800 is unassigned even in Unicode 3.2.
>
> I wonder why he has picked this value!
Because he knew this was well formed and that was in a range where
this could give troubles to Java (and now Python) implementations
I bet :-)
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/