[Expat-discuss] Expat treating ISO-8859-1 char strangely?
Fred L. Drake, Jr.
fdrake at acm.org
Fri Jul 25 12:44:52 EDT 2003
Stuart Powers writes:
> Hi, we're new to this mailling list, and we were wondering if anyone here could help us with a problem we're having.
>
> Our XML file (with encoding set to ISO-8859-1) contains the following string:
>
> "Kickin it Dash style"
>
> The apostrophe, we're pretty sure, is a character from the
> ISO-8859-1 character set. (We got this string for testing by
> copying and pasting from
> http://www.zeldman.com/daily/0703b.shtml#anil .)
>
> We're using XML::DOM (which uses XML::DOM::Parser, which supposedly uses Expat) to parse this XML file, and when we send the parsed data to a browser (via HTTP), it comes out like this:
>
> " Kickin it Dash style"
Other than the space prepended at the begining, that's the UTF-8 I'd
expect.
> That is how Mozilla displays it when it is set to read character
> encoding ISO-8859-1. When set to read UTF-8, it simply displays
> "Kickin#146; it Dash style".
I don't know why Mozilla would display it like that. That's a Mozilla
issue.
> We would sort of understand it if Expat simply took our ISO-8859-1
> character and copied it directly (byte by byte), or if it somehow
> converted it to UTF-8 and we got a UTF-8 character, but it appears
> that it's doing neither - it's sending us bytes which don't seem to
> be a valid character in either character set.
This is definately a display thing. ISO-8859-1 character 0222 (0x92)
converts to the UTF-8 sequence 0302 0222 (0xc2 0x92). So Expat is
doing the right thing.
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation
More information about the Expat-discuss
mailing list