[XML-SIG] Re: Issues with Unicode type
Uche Ogbuji
uche.ogbuji@fourthought.com
Mon, 23 Sep 2002 15:16:08 -0600
> On Mon, Sep 23, 2002 at 07:21:41PM +0200, Eric van der Vlist wrote:
> > Yep, and that's what James Clark is doing in his Java implementation:
> >
> > public int getLength(Object obj) {
> > String str = (String)obj;
> > int len = str.length();
> > int nSurrogatePairs = 0;
> > for (int i = 0; i < len; i++)
> > if (Utf16.isSurrogate1(str.charAt(i)))
> > nSurrogatePairs++;
> > return len - nSurrogatePairs;
> > }
> >
> > And I need to do the same in Python...
>
> yep, that simple,
Oh, but then Python is so much simpler:
SP_PAT = re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")
def smart_len(u):
sp_count = len(SP_PAT.findall(u))
return len(u) - sp_count
Problem solved.
The great thing about Python is even when it frustrates you one moment, it
finds a way to quickly make up for it.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/webservices/library/ws-pyth10.html