[XML-SIG] Re: Issues with Unicode type

Eric van der Vlist vdv@dyomedea.com
23 Sep 2002 23:41:04 +0200

On Mon, 2002-09-23 at 23:26, Daniel Veillard wrote:
> On Mon, Sep 23, 2002 at 10:50:34PM +0200, Eric van der Vlist wrote:
> > Except that it's not the only location where it's broken and that won't
> > work with regular expressions. If I define a pattern such as ".{5}" I
> > want to check that this is 5 unicode characters, not 5 words of 16
> > bits...
>   I don't know about Relax regexp, but for schemas I had to rewrite
> an engine to cope with the full regexps of the beast.

That's the same beast :-( ... there is no such thing as Relax NG regexp
and it's just borrowing the datatypes from W3C XML Schema and most of
their facets including patterns.

Would you have Python bindings available for this regexps engine?

> > I am starting to think that compiling Python for 32 bits might be the
> > safest way to solve this issue.
>   You can't make that assumption, it's the safest for your developper
> but becomes an user nightmare. If you develop a library I assume
> it's ultimately to have people use it, if they first need to recompile
> python and handle multiple version, it's a serious mess.
> > Can you confirm that this is what RedHat does by default as mentioned
> > Uche and do you know the motivations (and eventually downsides) for thi=
> > decision?
>   By default Red Hat compiles python with unicode support in UTF-16.
> I'm not in charge of this, I assume it's the default compilation option.
> IMHO it's a wrong assumption to think that UTF16 is a good cut, because
> you end up with variable lenght encoding anyway, and UCS32 would seriousl=
> bloat the app I'm afraid.

Yes, looks like the two options are equally bad :-(


Rendez-vous =E0 Paris.
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema