[XML-SIG] Re: Issues with Unicode type
Martin v. Loewis
martin@v.loewis.de
26 Sep 2002 14:17:26 +0200
Eric van der Vlist <vdv@dyomedea.com> writes:
> OTH, working on implementations of standards (or recs) without aiming
> for complete conformance is something which I consider as dangerous and
> I am reaching a point where Python doesn't look as a adequate plateform
> to implement W3C XML Schema datatypes (and hardly an adequate platform
> to implement Relax NG) because of the lack of support of non BMP code
> points.
Please understand that Python is free software. So if it does not fit
your needs, you can:
a) adjust your needs, or
b) adjust Python, or
c) not use Python.
It is only for non-free software where b) is no option.
> The two issues which I am currently aware of are the length of the
> strings which can be solved by implementing an application level length
> algorithm and, more serious, the support of the regular expressions
> required for the "pattern" facet for which I don't see how we could rely
> on the Python regexp features which are buggy when compiled as ucs4 and
> will not produce the expected result when compiled as ucs2.
>
> Unless we rely on external C extensions such as the ones developed by
> Daniel for libxml, I just see no way to be "natively conform"!
I think this is a simplification: You can certainly implement the len
algorithm without regular expressions at all:
if sys.maxunicode == 65535:
def smart_len(s):
l = 0
for c in s:
if not 0xd800 <= ord(i) < 0xdc00:
# skip high surrogates - only count the low surrogates
l += 1
return l
else:
smart_len = len
The same applies for NCName: You do not *have* to use regular
expressions. Instead, build a dictionary
NCName = {}
for char in all_ncname_chars:
NCName[char] = 1
With that, you can test whether a character is allowed with
NCName.has_key(char).
> Again, we can say that it won't matter for "real life applications" and
> that we don't care about conformance but that's a dangerous path.
My code shows that there is a fourth option, in addition to fixing
Python:
d) work around the bugs and limitations
Python is Turing-complete, so there is no algorithmic problem that
cannot be solved in Python. So, saying that you cannot "natively
conform" is an oversimplification.
Regards,
Martin