[XML-SIG] SAX characters() output on multiple lines for non-ascii
woodcock
woodcocs at hotmail.com
Sun Feb 3 00:04:21 CET 2008
I am starting with SAX and am trying to parse a file that contains non-ascii
characters. The xml file uses 'ISO-8859-1'. When it parses text containing
non-ascii characters the output is across multiple lines.
Example
Trying to output 'Der Einfluss kleiner naturnaher Retentionsmaßnahmen in der
Fläche auf den Hochwasserabfluss - Kleinrückhaltebecken'
The output I get is
Start ELEMENT ='title'
String read is 'Der Einfluss kleiner naturnaher Retentionsma'
String read is '▀'
String read is 'nahmen in der Fl'
String read is 'Σ'
String read is 'che auf den Hochwasserabfluss - Kleinr'
String read is 'ⁿ'
String read is 'ckhaltebecken -.'
End ELEMENT ='title'
whereas I want a single string something like...
Start ELEMENT ='title'
String read is 'Der Einfluss kleiner naturnaher Retentionsma▀nahmen in der
FlΣche auf den Hochwasserabfluss - Kleinrⁿckhaltebecken -.
End ELEMENT ='title'
My code is:
def characters(self, chars):
newchars=[]
newchars.append(chars.encode('ISO-8859-1'))
if newchars[-1] == '\n':
newchars = newchars[:-1]
if len(newchars)> 0:
output = 'String read is ' + "'" + ''.join(newchars) + "'\n"
sys.stdout.write(output)
return
Does anyone have any ideas?
--
View this message in context: http://www.nabble.com/SAX-characters%28%29-output-on-multiple-lines-for-non-ascii-tp15248449p15248449.html
Sent from the Python - xml-sig mailing list archive at Nabble.com.
More information about the XML-SIG
mailing list