[Tutor] xml, external entity and non-ascii paths
Panard
panard at inzenet.org
Fri Jul 1 19:39:44 CEST 2005
Hi,
Here is my problem :
My local encoding is iso-8859-15,
I use utf8 encoded xml files, which use dtd, which use a common dtd :
== sample.xml
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE sample SYSTEM "sample.dtd">
<sample/>
==
== sample.dtd
<!ENTITY % COMMON_DTD SYSTEM "common.dtd">
%COMMON_DTD;
<!ELEMENT sample EMPTY>
==
== common.dtd
<!ENTITY % sample_entity "sample">
==
I put them in a non-ascii directory :
/home/panard/tests/yéyé/ for exemple.
And now, I want to parse sample.xml using its absolute path name :
== test1.py
import codecs
import locale
import os
import xml.sax
import xml.sax.handler
def locale_from_unicode( s ) :
return codecs.getencoder( locale.getpreferredencoding() )( s,
'replace' )[ 0 ]
class EntityResolver :
def resolveEntity( self, publicId, systemId ) :
return locale_from_unicode( systemId )
parser = xml.sax.make_parser()
er = EntityResolver()
parser.setEntityResolver( er )
parser.parse( os.path.join( os.getcwd(), "sample.xml" ) )
==
which results to :
Traceback (most recent call last):
File "test1.py", line 18, in ?
parser.parse( os.path.join( os.getcwd(), "sample.xml" ) )
File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: /home/panard/tests/yéyé/sample.dtd:2:0:
error in processing external entity reference
I've tried to change the ExternalEntityRefHandler to know where the problem
really appends :
== test.py
import codecs
import locale
import os
import traceback
import xml.sax
from xml.sax import saxutils, xmlreader
import xml.sax.handler
def locale_from_unicode( s ) :
return codecs.getencoder( locale.getpreferredencoding() )( s,
'replace' )[ 0 ]
class EntityResolver :
def resolveEntity( self, publicId, systemId ) :
return locale_from_unicode( systemId )
# from /usr/lib/python2.4/xml/sax/expatreader.py line 373
def external_entity_ref( context, base, sysid, pubid):
print [ sysid ] ## modified
self = parser ## modified
if not self._external_ges:
return 1
source = self._ent_handler.resolveEntity(pubid, sysid)
source = saxutils.prepare_input_source(source,
self._source.getSystemId() or
"")
self._entity_stack.append((self._parser, self._source))
self._parser = self._parser.ExternalEntityParserCreate(context)
self._source = source
try:
xmlreader.IncrementalParser.parse(self, source)
except:
traceback.print_exc() ## modified
return 0 # FIXME: save error info here?
(self._parser, self._source) = self._entity_stack[-1]
del self._entity_stack[-1]
return 1
parser = xml.sax.make_parser()
er = EntityResolver()
parser.setEntityResolver( er )
parser.external_entity_ref = external_entity_ref
parser.parse( os.path.join( os.getcwd(), "sample.xml" ) )
==
the backtrace is now :
panard at sylvebarbe ~/tests/yéyé $ python test.py
[u'sample.dtd']
Traceback (most recent call last):
File "test.py", line 35, in external_entity_ref
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.4/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 20-22: invalid
data
... and the preceding backtrace ...
So my question is : is there a nice way to resolve my problem ?
For now, I've disabled external_entity_ref feature :(
Thanks,
Panard
--
HomePage : http://dev.inzenet.org/~panard
Qomics : http://qomics.inzenet.org
YZis editor - http://www.yzis.org
Smileys: http://smileys.inzenet.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/tutor/attachments/20050701/dcbcf6a8/attachment.pgp
More information about the Tutor
mailing list