Re: [lxml-dev] xml:space and xml:lang problem

Scott Haeger wrote:
Copying elements between trees will most likely not change the result. I cut that down to the following: ---------- .>>> from lxml import etree .>>> intree = etree.XML("""<?xml version="1.0"?> ... <svg xmlns:xml=" http://www.w3.org/1998/XML"> ... <a id="first" xml:space="default"></a> ... </svg> ... """) .>>> etree.tostring(intree) '<svg>\n<a id="first" xml:space="default"/>\n</svg>' ---------- So this definitely misses the XML namespace declaration. BUT, according to the spec, that is not a problem. """ The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. """ Source: http://www.w3.org/TR/REC-xml-names/ Is that what you meant when you said it demonstrates a failure?
The problem occurs with and without the namespace declaration.
Because, according to the spec, both are the same.
Also, removing the xml:space attribute corrects the problem.
Probably, since it only references explicitly declared namespaces in that case. Still, could you try to come up with an example that shows your unreadable characters on serialization? Thanks, Stefan

Hi, On Wed, 2006-02-22 at 08:39 +0100, Stefan Behnel wrote:
Scott Haeger wrote:
[...]
[...] I don't know if this is just a typo in the example, but the namespace-URI begins with a space-character: <svg xmlns:xml=" http://www.w3.org/1998/XML"> $ xmllint --debug xmlns.xml xmlns.xml:2: namespace error : xml namespace prefix mapped to wrong URI <svg xmlns:xml=" http://www.w3.org/1998/XML"> ^ DOCUMENT version=1.0 URL=xmlns.xml standalone=true namespace xml href=http://www.w3.org/XML/1998/namespace ELEMENT svg TEXT interned content= ELEMENT a ATTRIBUTE id TEXT content=first ATTRIBUTE space TEXT content=default TEXT interned content= [...]
[...] Just an info: Libxml2 strips all explicit declarations of the XML namespace, since it stores the XML ns-declaration in a special field on the doc itself, namely in xmlDoc->oldNs. The XML namespace declaration is "built-in" by every XML processor, so you don't have to declare it. Regards, Kasimier

Kasimier My fault on the xml namespace. It should be xmlns:xml=" http://www.w3.org/XML/1998/namespace". That solves the xmllint problem. The problem I am seeing occurs with or without the additional space before the URI. I have two test scripts to illustrate my problem. The two scripts and the test xml file follow. Another test file would be an Inkscape document with a text element. Problem does not occur in this script: from lxml import etree import sys intree = etree.parse("test.xml") intree.write(sys.stdout) Problem occurs: Note the xml namespace in the output. from lxml import etree import sys intree = etree.parse("test.xml") outroot = etree.Element("root") doc = intree.getiterator() for el in doc: newel = el outroot.append(newel) outtree = etree.ElementTree(outroot) outtree.write(sys.stdout) Test file: <?xml version="1.0"?> <svg xmlns:svg="http://www.w3.org/2000/svg" xmlns:xml=" http://www.w3.org/XML/1998/namespace"> <a id="first" xml:space="default"></a> </svg> Interesting notes: The space before the URI does not affect result. Switching for xml:space to svg:space fixes the problem Problem occurs with or without xml namespace declaration __copy__ and/or append are suspect? Parsing not handling the xml namespace properly? I wish I knew more about the library. Scott On 2/22/06, Kasimier Buchcik <K.Buchcik@4commerce.de> wrote:

Scott Haeger wrote:
My system: Python - 2.4.2 lxml - from scoder2 branch and current SVN libxml2 - 2.6.23 My output for the above script: <root><svg xmlns:svg="http://www.w3.org/2000/svg "> </svg><a id="first" xml:space="default"/> </root> I do not see a problem here. Maybe you are using an outdated version of libxml2? Stefan

I upgraded to the latest SVN version. I believe it has fixed the problem. Strange, my version of lxml was only about a week old. Thanks for the help. Scott On 2/22/06, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> wrote:

Hi, On Wed, 2006-02-22 at 08:39 +0100, Stefan Behnel wrote:
Scott Haeger wrote:
[...]
[...] I don't know if this is just a typo in the example, but the namespace-URI begins with a space-character: <svg xmlns:xml=" http://www.w3.org/1998/XML"> $ xmllint --debug xmlns.xml xmlns.xml:2: namespace error : xml namespace prefix mapped to wrong URI <svg xmlns:xml=" http://www.w3.org/1998/XML"> ^ DOCUMENT version=1.0 URL=xmlns.xml standalone=true namespace xml href=http://www.w3.org/XML/1998/namespace ELEMENT svg TEXT interned content= ELEMENT a ATTRIBUTE id TEXT content=first ATTRIBUTE space TEXT content=default TEXT interned content= [...]
[...] Just an info: Libxml2 strips all explicit declarations of the XML namespace, since it stores the XML ns-declaration in a special field on the doc itself, namely in xmlDoc->oldNs. The XML namespace declaration is "built-in" by every XML processor, so you don't have to declare it. Regards, Kasimier

Kasimier My fault on the xml namespace. It should be xmlns:xml=" http://www.w3.org/XML/1998/namespace". That solves the xmllint problem. The problem I am seeing occurs with or without the additional space before the URI. I have two test scripts to illustrate my problem. The two scripts and the test xml file follow. Another test file would be an Inkscape document with a text element. Problem does not occur in this script: from lxml import etree import sys intree = etree.parse("test.xml") intree.write(sys.stdout) Problem occurs: Note the xml namespace in the output. from lxml import etree import sys intree = etree.parse("test.xml") outroot = etree.Element("root") doc = intree.getiterator() for el in doc: newel = el outroot.append(newel) outtree = etree.ElementTree(outroot) outtree.write(sys.stdout) Test file: <?xml version="1.0"?> <svg xmlns:svg="http://www.w3.org/2000/svg" xmlns:xml=" http://www.w3.org/XML/1998/namespace"> <a id="first" xml:space="default"></a> </svg> Interesting notes: The space before the URI does not affect result. Switching for xml:space to svg:space fixes the problem Problem occurs with or without xml namespace declaration __copy__ and/or append are suspect? Parsing not handling the xml namespace properly? I wish I knew more about the library. Scott On 2/22/06, Kasimier Buchcik <K.Buchcik@4commerce.de> wrote:

Scott Haeger wrote:
My system: Python - 2.4.2 lxml - from scoder2 branch and current SVN libxml2 - 2.6.23 My output for the above script: <root><svg xmlns:svg="http://www.w3.org/2000/svg "> </svg><a id="first" xml:space="default"/> </root> I do not see a problem here. Maybe you are using an outdated version of libxml2? Stefan

I upgraded to the latest SVN version. I believe it has fixed the problem. Strange, my version of lxml was only about a week old. Thanks for the help. Scott On 2/22/06, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> wrote:
participants (3)
-
Kasimier Buchcik
-
Scott Haeger
-
Stefan Behnel