python & xml question

jano jnana4 at DELETEhotmailCAPS.com
Sat Aug 3 15:17:54 EDT 2002


Hi Martin,

"Martin v. Loewis" <martin at v.loewis.de> wrote in message
news:m365ysz0rl.fsf at mira.informatik.hu-berlin.de...
> "jano" <jnana4 at DELETEhotmailCAPS.com> writes:
>
> > I am just getting started with using python for xml.  I have been
reading
> > 'Python & XML', but it doesn't explain the EntityResolver interface well
> > enough for me to know how to use it.  I am using sax to parse an xml
file,
> > but the program halts when it gets to the first '—'
>
> Do you have a DOCTYPE declaration in the documented? That might be the
> easiest approach: add a DOCTYPE that declares mdash; the parser should
> then replace it automatically.

Are you asking if there is an associated DTD?  There is, and it does declare
the mdash entity and what it should be replaced with, like so:

<!ENTITY mdash "—">


> In fact, when you have control over the XML file, it might be even
> easier to write —, then no resolution is needed at all.

I tried this suggestion, but the following error occurs:

Traceback (most recent call last):
  File "test.py", line 15, in ?
    saxparser.parse(sys.stdin)
  File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 80, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python2.1/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 124, in feed
    self._parser.Parse(data, isFinal)
  File "quoteHandler.py", line 17, in characters
    print characters
UnicodeError: ASCII encoding error: ordinal not in range(128)

Is this saying that — is outside the UTF-8 range?

> > How do I specify what that should resolve to? I thought that
> > DeclHandler.internalEntityDecl(name, value) seemed like it might be what
I
> > am looking for, but I can't get that to work.
>
> That won't work. The internalEntityDecl handler is invoked every time
> an internal decl is found inside the document, such as
>
> <!ENTITY mdash   CDATA "—" -- em dash, U+2014 ISOpub -->
>
> (actually, it might not be invoked when the parser does not support
> that event).
>
> Instead, you need to provide an EntityResolver; the parser will then
> invoke resolveEntity when it encounters an entity supposedly in the
> external subset.
>
> Notice that this didn't work until PyXML 0.8.

Ah, I'm using PyXML 0.6.5 under Cygwin, because I couldn't get the later
versions to work under cygwin.  Could this be a source of my problems?

> Regards,
> Martin

Thanks for all your help, Martin.

cheers,

jano





More information about the Python-list mailing list