[Tutor] python xml entity problem

smith@rfa.org smith@rfa.org
Tue, 3 Sep 2002 16:06:37 -0400


I'm interested in parsing  a xml file using the python tools
in debian woody. Everything seems to be ok until I reach a "&MAN;"
My python script just passes over it. My guess is that I have a
external entity resolver problem. I've been reading the Python XML book
on O'reilly and I believe I'm doing the right things. At least in terms of
non external entities. Does anybody have any examples or how can I make
the program recognize external entity.
I'm still very new to python and xml so maybe it's something I don't 
understand.

The xml file starts off with something like this:

<?xml version='1.0' encoding="UTF-8" standalone="no"?>
<!DOCTYPE schedule SYSTEM "ftp://something.org/pub/xml_files/program.dtd">
<?xml:stylesheet
type="text/xsl"href="ftp://something.org/pub/xml_files/program.xsl"?>
<schedule>
<pgm_block>
        <id></id>
        <arch>http://something/MAN/2000/02/test.mp3</arch>
        <air_date>2000/02/02</air_date>
        <air_time>16:00</air_time>
        <service_id>&MAN;</service_id>
        <block_time>00:70:00</block_time>
        <sch_status>archive</sch_status>
        <mc>AW</mc>
        <producer>AW</producer>
        <editor>XZ</editor>
</pgm_block>
</schedule>
		       
The dtd looks something like this:
			       
<!--Xml Project \ Program DTD \ V1.00 dmb April 2001-->
<!ELEMENT schedule (pgm_block,segment*)>
<!ELEMENT pgm_block(id?,arch?,air_date?,air_time?,service_id?,block_time?,sch_status,mc,producer,editor)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT arch (#PCDATA)>
<!ELEMENT air_date (#PCDATA)>
<!ELEMENT air_time (#PCDATA)>
<!ELEMENT service_id (#PCDATA)>
<!ENTITY  BUR "Burmese">
<!ENTITY  KHM "Cambodian">
<!ENTITY  CAN "Cantonese">
<!ENTITY  KOR "Korean">
<!ENTITY  LAO "Lao">
<!ENTITY  MAN "Mandarin">
<!ENTITY  TIB "Tibetan">
<!ENTITY  UYG "Uyghur">
<!ENTITY  VIE "Vietnamese">

oh yeah, this is what I'm calling in the beginning of the python script:

from xml.dom.ext.reader.Sax2 import FromXmlStream
from xml.sax import xmlreader
import sys

Your input is appreciated
thanks.

-Smith