[XML-SIG] Re: Issues with Unicode type
Uche Ogbuji
uche.ogbuji@fourthought.com
Mon, 23 Sep 2002 15:47:33 -0600
> On Mon, Sep 23, 2002 at 03:16:08PM -0600, Uche Ogbuji wrote:
> > Oh, but then Python is so much simpler:
> >
> >
> > SP_PAT = re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")
> > def smart_len(u):
> > sp_count = len(SP_PAT.findall(u))
> > return len(u) - sp_count
> >
> >
> > Problem solved.
>
> modulo the space and CPU requirements for the operation (okay you can tell
> I'm primarilly a C coder :-)
I don't see the significant space requirments. As for CPU, Python's len() is
already much slower than wstrlen() anyway, so I don't think your point is very
valid once someone has already made the choice to use Python.
> > The great thing about Python is even when it frustrates you one moment, it
> > finds a way to quickly make up for it.
>
> I don't think chars are classes but types, and hence one cannot
> make a subclass of strings whose instances could have all length/walk/extract
> operations being special cased to reflect XML unicode string. I (and Eric
> I bet) would like to be wrong on this :-)
You can subclass strings in Python 2.2 and more recent. Tyes and classes were
unified in Python 2.2.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.
html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/w
ebservices/library/ws-pyth10.html