[Tutor] python xml entity problem
smith@rfa.org
smith@rfa.org
Tue, 3 Sep 2002 16:06:37 -0400
I'm interested in parsing a xml file using the python tools
in debian woody. Everything seems to be ok until I reach a "&MAN;"
My python script just passes over it. My guess is that I have a
external entity resolver problem. I've been reading the Python XML book
on O'reilly and I believe I'm doing the right things. At least in terms of
non external entities. Does anybody have any examples or how can I make
the program recognize external entity.
I'm still very new to python and xml so maybe it's something I don't
understand.
The xml file starts off with something like this:
<?xml version='1.0' encoding="UTF-8" standalone="no"?>
<!DOCTYPE schedule SYSTEM "ftp://something.org/pub/xml_files/program.dtd">
<?xml:stylesheet
type="text/xsl"href="ftp://something.org/pub/xml_files/program.xsl"?>
<schedule>
<pgm_block>
<id></id>
<arch>http://something/MAN/2000/02/test.mp3</arch>
<air_date>2000/02/02</air_date>
<air_time>16:00</air_time>
<service_id>&MAN;</service_id>
<block_time>00:70:00</block_time>
<sch_status>archive</sch_status>
<mc>AW</mc>
<producer>AW</producer>
<editor>XZ</editor>
</pgm_block>
</schedule>
The dtd looks something like this:
<!--Xml Project \ Program DTD \ V1.00 dmb April 2001-->
<!ELEMENT schedule (pgm_block,segment*)>
<!ELEMENT pgm_block(id?,arch?,air_date?,air_time?,service_id?,block_time?,sch_status,mc,producer,editor)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT arch (#PCDATA)>
<!ELEMENT air_date (#PCDATA)>
<!ELEMENT air_time (#PCDATA)>
<!ELEMENT service_id (#PCDATA)>
<!ENTITY BUR "Burmese">
<!ENTITY KHM "Cambodian">
<!ENTITY CAN "Cantonese">
<!ENTITY KOR "Korean">
<!ENTITY LAO "Lao">
<!ENTITY MAN "Mandarin">
<!ENTITY TIB "Tibetan">
<!ENTITY UYG "Uyghur">
<!ENTITY VIE "Vietnamese">
oh yeah, this is what I'm calling in the beginning of the python script:
from xml.dom.ext.reader.Sax2 import FromXmlStream
from xml.sax import xmlreader
import sys
Your input is appreciated
thanks.
-Smith