Python 3 - xml - crlf handling problem
durumdara
durumdara at gmail.com
Fri Dec 2 03:13:43 EST 2011
Dear Stefan!
So: may I don't understand the things well, but I thought that parser
drop the "nondata" CRLF-s + other characters (not preserve them).
Then don't matters that I read the XML from a file, or I create it
from code, because all of them generating SAME RESULT.
But Python don't do that.
If I make xml from code, the code is without plus characters.
But Python preserves parsed CRLF characters somewhere, and they are
also flushing into the result.
Example:
original='''
<?xml version="1.0" encoding="utf-8"?>
<doc a="1">
<element a="1">
AnyText
</element>
</doc>
'''
If I parse this, and write with toxml, the CRLF-s remaining in the
code, but if I create this document line by line, there is no CRLF,
the toxml write "only lined" xml.
This also meaning that if I use prettyxml call, to prettying the xml,
the file size is growing.
If there is a multiple processing queue - if two pythons communicating
in xml files, the size can growing every time.
Py1 - read the Py2's file, process it, and write to a result file
Py2 - read the Py1's result file, process it, and pass back to Py1
this can grow the file with each call, because "pretty" CRLF-s not
normalized out from the code.
original='''
<?xml version="1.0" encoding="utf-8"?>
<doc a="1">
<element a="1">
AnyText
</element>
</doc>
'''
def main():
f = open('test.0.xml','w')
f.write(original.strip())
f.close()
for i in range(1, 10 + 1):
xo = parse('test.%d.xml' % (i - 1))
de = xo.documentElement
de.setAttribute('c', str(i))
t = de.getElementsByTagName('element')[0]
tn = t.childNodes[0]
print (dir(t))
print (tn)
print (tn.nodeValue)
tn.nodeValue = str(i) + '\t' + '\n'
#s = xo.toxml()
s = xo.toprettyxml()
f = open('test.%d.xml' % i,'w')
f.write(s)
f.close()
sys.exit()
And: because Python is not converting CRLF to &013; I cannot make
different from "prettied source's CRLF" (loaded from template file),
"my own pretty's CRLF" (my own topretty), and really contained CRLF
(for example a memo field's value).
My case is that the processor application (for whom I pass the XML
from Python) is sensitive to "plus CRLF"-s in text nodes, I must do
something these "plus" items to avoid external's program errors.
I got these templates and input files from prettied format (with
CRLFS), but I must "eat" them to make an XML that one lined if
possible.
I hope you understand my problem with it.
Thanks:
dd
More information about the Python-list
mailing list