[Python-Dev] [PATCH][BUG] Segmentation Fault in xml.dom.minidom.parse

Evan Jones ejones at uwaterloo.ca
Fri Sep 30 02:28:46 CEST 2005


The following Python script causes Python 2.3, 2.4 and the latest CVS  
to crash with a Segmentation Fault:

import xml.dom.minidom
x = u'<?xml version="1.0"?>\n<fran\xe7ais>Comment \xe7a va ? Tr\xe8s  
bien ?</fran\xe7ais>'
dom = xml.dom.minidom.parseString( x.encode( 'latin_1' ) )
print repr( dom.childNodes[0].localName )


The problem is that this XML document does not specify an encoding. In  
this case, minidom assumes that it is encoded in UTF-8. However, in  
fact it is encoded in Latin-1. My two line patch, in the SourceForge  
tracker at the URL below, causes this to raise a UnicodeDecodingError  
instead.

http://sourceforge.net/tracker/index.php? 
func=detail&aid=1309009&group_id=5470&atid=305470

Any chance that someone wants to commit this tiny two line fix? This  
might be the kind of fix that might be elegible to be backported to  
Python 2.4 as well. It passes "make test" on both my Linux system and  
my Mac. I've also attached a patch that adds this test case to  
test_minidom.py.

Thanks,

Evan Jones

--
Evan Jones
http://evanjones.ca/



More information about the Python-Dev mailing list