[ expat-Bugs-481609 ] Wrong umlauts after parsing
noreply@sourceforge.net
noreply@sourceforge.net
Mon Apr 22 04:21:01 2002
Bugs item #481609, was opened at 2001-11-14 00:33
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=481609&group_id=10127
Category: XML::Parser (Perl module)
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Thomas Frings (frings)
Assigned to: Clark Cooper (coopercc)
Summary: Wrong umlauts after parsing
Initial Comment:
Parsing a xml-file that contains german umlauts like
ä ö ü or their encoding like ä ä or ü
results in 'C$' (instead of 'ä'), 'C<' (instead of 'ü')
or 'C6' (instead of 'ö').
What's going wrong?
System: Solaris 2.8
expat 1.95.2
XML-Parser 2.30
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2002-04-22 04:20
Message:
Logged In: NO
When you write umlauts in attributes, it goes completely
wrong:
<image id="2" alt="Schön" />
results in a value alt="Schn" or (in newer versions of
Expat) in a Well-Formed error.
When you do alt="Schün" you get alt="Schn" , too.
The only workaround is doing: alt="Sch&uuml;n" , and
that isn't nice at all.
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-15 20:39
Message:
Logged In: YES
user_id=3066
The output shown is not UTF-8, but UTF-8 with the high bit
stripped. I expect this was an artifact of the display font
or the terminal. Expat should produce UTF-8 in all cases;
that's part of the intended interface.
----------------------------------------------------------------------
Comment By: Simon Gordon (si_gordon)
Date: 2001-11-14 16:03
Message:
Logged In: YES
user_id=227124
I believe this is UTF-8. Expat always outputs in UTF-8
rather than either (a) what you want or (b) what the XML
encoding is set to.
I have long-held the belief that this is a bug even though
the relese notes for 1.95 documented this fact. I had to
patch my version to output ISO-8859-1 for exactly the same
reason - I needed umlauted characters in ISO, not UTF-8.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=481609&group_id=10127