[New-bugs-announce] [issue17089] Expat parser parses strings only when XML encoding is UTF-8
Serhiy Storchaka
report at bugs.python.org
Thu Jan 31 11:01:19 CET 2013
New submission from Serhiy Storchaka:
xmlparser.Parse() works with string data only if XML encoding is utf-8 (or ascii). Examples:
>>> import xml.parsers.expat
>>> parser = xml.parsers.expat.ParserCreate()
>>> content = []
>>> parser.CharacterDataHandler = content.append
>>> parser.Parse("<?xml version='1.0' encoding='utf-8'?><tag>\xb5</tag>")
1
>>> content
['µ']
>>> parser = xml.parsers.expat.ParserCreate()
>>> content = []
>>> parser.CharacterDataHandler = content.append
>>> parser.Parse("<?xml version='1.0' encoding='iso8859'?><tag>\xb5</tag>")
1
>>> content
['µ']
>>> parser = xml.parsers.expat.ParserCreate()
>>> content = []
>>> parser.CharacterDataHandler = content.append
>>> parser.Parse("<?xml version='1.0' encoding='utf-16'?><tag>\xb5</tag>")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
xml.parsers.expat.ExpatError: encoding specified in XML declaration is incorrect: line 1, column 30
This affects all other modules which works with XML: xml.sax, xml.dom.minidom, xml.dom.pulldom, xml.etree.ElementTree.
Here is a patch which fixes parsing string data with non-UTF-8 XML.
----------
assignee: serhiy.storchaka
components: Extension Modules, Unicode, XML
files: pyexpat_parse_str.patch
keywords: patch
messages: 181014
nosy: ezio.melotti, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Expat parser parses strings only when XML encoding is UTF-8
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file28916/pyexpat_parse_str.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17089>
_______________________________________
More information about the New-bugs-announce
mailing list