[XML-SIG] problems reading iso-8859-1 data

Jørgen Frøjk Kjærsgaard jfk@informaticon.dk
Mon, 30 Apr 2001 11:41:50 +0200


Alan Kennedy wrote:
> =

> Don,
> =

> Just a quick suggestion.
> =

> > I have an XML file with iso-8859-1 encoding. The sax
> > parser (expat) seems to translating charaters above 128
> > to to separate characters.
> > For example "é" in the xml file is being interpreted as
> > "é" by the parser.
> > (I'm running python 1.5.2 with PyXML 0.6.5)
> >
> > Am I missing something obvious?

Expat always translates the parsed input to UTF-8 encoding. Python 2.0
handles this correctly but I'm not sure about Python 1.5.x as I've never
used it for XML processing.

> Have you placed an encoding declaration at the top of your XML file, i.=
e.
> something along the lines of
> =

> <?xml version="1.0" encoding="iso-8859-1"?>

This does not change the fact that Expat outputs UTF-8. However, if the
Expat parser hasn't been told to use iso-8859-1 as default encoding, it
will assume UTF-8 input unless you state the encoding in the input as
above.

/jfk

-- 
Jørgen Frøjk Kjærsgaard, Systemkonsulent (Systems Consultant)
Informaticon ApS * Web: www.informaticon.dk * Tlf: +45 8672 0093
Internet programmering * Systemudvikling på Linux, FreeBSD og PalmOS