[ expat-Bugs-491986 ] Charset decoding error (64-bit systems)
noreply@sourceforge.net
noreply@sourceforge.net
Wed Jul 10 11:00:07 2002
Bugs item #491986, was opened at 2001-12-12 07:48
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127
Category: None
>Group: Platform Specific
Status: Open
Resolution: Works For Me
Priority: 5
Submitted By: Bent Jensen (bentjensen)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Charset decoding error (64-bit systems)
Initial Comment:
When parsing xml with Danish letters (æøåÆØÅ) with
eight bit set and declaring the encoding as <?xml
version="1.0" encoding="iso-8859-1"?> (where the
danish letters is placed as eight bit chars - the
parser goes wrong. If the input is:
<person id="five.worker">
<name><family>Worker</family>
<given>Five</given></name>
<email>Jørgen five@foo.com</email>
<email>Jørgen five@foo.com</email>
<link manager="Big.Boss"/>
</person>
(Remark the danish letters in two forms)
The output is:
START: email
CD: (null) - 'J' - 1
CD: (null) - 'rgen five@foo.com' - 17
END: email
CD: (null) - '
' - 1
CD: (null) - ' ' - 4
START: email
CD: (null) - 'JÃ؟rgen five@foo.com' - 20
END: email
CD: (null) - '
' - 1
CD: (null) - ' ' - 4
What am i doing wrong ?
If I embedd the string 'æøåÆØÅ' in the xml file - it
goes all rigth ?!?!
I have modifyed the 'outline' example program for the
above test.
Sincerly
Bent Jensen, Senior consultant.
bent@kiya.dk
----------------------------------------------------------------------
>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-07-10 13:59
Message:
Logged In: YES
user_id=3066
I just tried this on the Alpha system on the SourceForge
compile farm and the CVS version of Expat, and the
regression test I added doesn't trigger. Can you still
reproduce this with the CVS version of Expat?
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 15:45
Message:
Logged In: YES
user_id=3066
Good point. I've re-opened the report and noted the 64-bit
dependency in the summary.
----------------------------------------------------------------------
Comment By: Bent Jensen (bentjensen)
Date: 2002-04-19 15:14
Message:
Logged In: YES
user_id=392963
Hi again
I have tried all combinations of telleing the parset that i
want to use iso-8859-1 encoding - also to the
XML_ParserCreate function.
But you have to remark that i am running on a 64 bit
machine and in the routine where you are reading the input
chars you are doing bit shifts 'en masse' - and here
everything can goes wrong - bitshifts are not portable !
Sincerly
Bent Jensen, Senior consultant.
bent@kiya.dk
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 14:46
Message:
Logged In: YES
user_id=3066
I cannot reproduce this using CVS Expat. I've added a
regression test for this to be sure it doesn't crop up
(tests/runtests.c revision 1.7).
Make sure that you're passing either NULL or "iso-8859-1" to
the XML_ParserCreate*() function as the encoding name.
----------------------------------------------------------------------
Comment By: Bent Jensen (bentjensen)
Date: 2001-12-12 07:56
Message:
Logged In: YES
user_id=392963
Info: The expat package (version 1.95.2) was build on
alpha/axp OSF1 4.0D with gcc version 2.95.3. The test was
run on the same machine.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127