[XML-SIG] Potential issue with re too (Was: Issues with Unicode type)
Eric van der Vlist
vdv@dyomedea.com
23 Sep 2002 22:27:17 +0200
Still in the context of WXS datatypes and their facets, there is a
potential issue with regular expressions (needed for the pattern facet):
>>> print c.__repr__()
u'\u10800'
>>> print re.findall(".", c)
[u'\u1080', u'0']
>>> print re.findall(c, c)
[u'\u10800']
>>> print re.findall(u'\u1080', c)
[u'\u1080']
>>> print re.findall(u'0', c)
[u'0']
The re module handles surrogates according to their dual nature,
counting them as two characters (which is not what's expected by let's
say "." or ".{2}") but still recognizing it as u'\u10800' which doesn't
seem like a safe basis to build a compliant type library.
Eric
--=20
Rendez-vous =E0 Paris.
http://www.technoforum.fr/integ2002/index.html
------------------------------------------------------------------------
Eric van der Vlist http://xmlfr.org http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------