[XML-SIG] DOM and non ascii element or attribute names

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 29 Jun 2001 01:12:53 +0200


> >From my reading of the DOM L2 spec, it should be possible to create DOM
> nodes (elements or attributes) with non-ascii characters in their
> name. 4DOM seems to be preventing this, with 2 lines in Document.py :
> 
> #FIXME: should allow combining characters: fix when Python gets Unicode
> g_namePattern = re.compile('[a-zA-Z_:][\w\.\-_:]*\Z')
> 
> My unicode-fu is a bit too low right now to write a patch (it should
> improve soon), but I'll be glad to do some testing. 

If you have the CVS version available, just try using

g_namePattern = xml.utils.characters.re_Name

Not sure what the \Z in the end is good for; it might be "more right"
to write

g_namePattern = re.compile(xml.utils.characters.Name + "\Z")

instead. The constants literally taken from the XML recommendation;
if there are problems wiht that, I'd like to know about them.

If you find this working, it would be a valuable contribution to find
all such regular expressions in 4DOM and replace them with the
xml.utils.characters equivalents. If you find anything missing there,
please let me know.

Regards,
Martin