[XML-SIG] Re: Issues with Unicode type

Uche Ogbuji uche.ogbuji@fourthought.com
Mon, 23 Sep 2002 15:47:33 -0600


> On Mon, Sep 23, 2002 at 03:16:08PM -0600, Uche Ogbuji wrote:
> > Oh, but then Python is so much simpler:
> > 
> >     
> > SP_PAT = re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")
> > def smart_len(u):
> >     sp_count = len(SP_PAT.findall(u))
> >     return len(u) - sp_count
> > 
> > 
> > Problem solved.
> 
>   modulo the space and CPU requirements for the operation (okay you can tell
> I'm primarilly a C coder :-)

I don't see the significant space requirments.  As for CPU, Python's len() is 
already much slower than wstrlen() anyway, so I don't think your point is very 
valid once someone has already made the choice to use Python.


> > The great thing about Python is even when it frustrates you one moment, it 
> > finds a way to quickly make up for it.
> 
>   I don't think chars are classes but types, and hence one cannot
> make a subclass of strings whose instances could have all length/walk/extract
> operations being special cased to reflect XML unicode string. I (and Eric
> I bet) would like to be wrong on this :-)

You can subclass strings in Python 2.2 and more recent.  Tyes and classes were 
unified in Python 2.2.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.
html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/w
ebservices/library/ws-pyth10.html