[XML-SIG] SAX characters() output on multiple lines for non-ascii
"Martin v. Löwis"
martin at v.loewis.de
Thu Feb 7 07:01:52 CET 2008
> However if I try and put some of the surrounding text back in either by
> concatenating strings or using multiple sys.stdout.write() calls I get
> repetitions of the strings.
>
> if len(newchars)> 0:
> output = ''.join(newchars)
> sys.stdout.write("String read is '")
> sys.stdout.write(output)
> sys.stdout.write("'")
>
>
> Start ELEMENT ='title'
> String read is 'Der Einfluss kleiner naturnaher Retentionsma'String read is
> '▀'S
> tring read is 'nahmen in der Fl'String read is 'Σ'String read is 'che auf
> den Ho
> chwasserabfluss - Kleinr'String read is 'ⁿ'String read is 'ckhaltebecken -.'
> End ELEMENT ='title'
Please read Fred Drake's answer again. SAX will split the data in the
XML document into multiple pieces. You put your decoration ("String read
is") around each piece. Multiple pieces -> multiple decorations.
To solve this issue, collect all pieces in a global variable:
output = u""
def characters(self, chars):
global output
output += chars
def endElement(self, name):
global output
print "String read is", output.encode("latin-1")
output = u""
You could also chose to make output an attribute of self.
Regards,
Martin
More information about the XML-SIG
mailing list