[XML-SIG] Re: Issues with Unicode type

Daniel Veillard veillard@redhat.com
Mon, 23 Sep 2002 16:35:22 -0400


On Mon, Sep 23, 2002 at 07:21:41PM +0200, Eric van der Vlist wrote:
> Yep, and that's what James Clark is doing in his Java implementation:
> 
>   public int getLength(Object obj) {
>     String str = (String)obj;
>     int len = str.length();
>     int nSurrogatePairs = 0;
>     for (int i = 0; i < len; i++)
>       if (Utf16.isSurrogate1(str.charAt(i)))
> 	nSurrogatePairs++;
>     return len - nSurrogatePairs;
>   }
> 
> And I need to do the same in Python...

  yep, that simple,

> > Notice also that U+10800 is unassigned even in Unicode 3.2.
> 
> I wonder why he has picked this value!

  Because he knew this was well formed and that was in a range where
this could give troubles to Java (and now Python) implementations 
I bet :-)

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/