[Expat-bugs] [ expat-Bugs-491986 ] Charset decoding error (64-bit
systems)
noreply@sourceforge.net
noreply@sourceforge.net
Wed Nov 20 05:13:48 2002
Bugs item #491986, was opened at 2001-12-12 07:48
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127
Category: None
Group: Platform Specific
Status: Open
Resolution: Works For Me
Priority: 5
Submitted By: Bent Jensen (bentjensen)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Charset decoding error (64-bit systems)
Initial Comment:
When parsing xml with Danish letters (æøåÆØÅ) with
eight bit set and declaring the encoding as <?xml
version="1.0" encoding="iso-8859-1"?> (where the
danish letters is placed as eight bit chars - the
parser goes wrong. If the input is:
<person id="five.worker">
<name><family>Worker</family>
<given>Five</given></name>
<email>Jørgen five@foo.com</email>
<email>Jørgen five@foo.com</email>
<link manager="Big.Boss"/>
</person>
(Remark the danish letters in two forms)
The output is:
START: email
CD: (null) - 'J' - 1
CD: (null) - 'rgen five@foo.com' - 17
END: email
CD: (null) - '
' - 1
CD: (null) - ' ' - 4
START: email
CD: (null) - 'JÃ؟rgen five@foo.com' - 20
END: email
CD: (null) - '
' - 1
CD: (null) - ' ' - 4
What am i doing wrong ?
If I embedd the string 'æøåÆØÅ' in the xml file - it
goes all rigth ?!?!
I have modifyed the 'outline' example program for the
above test.
Sincerly
Bent Jensen, Senior consultant.
bent@kiya.dk
----------------------------------------------------------------------
>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-11-20 00:13
Message:
Logged In: YES
user_id=3066
Using the SourceForge compile farm, I'm not able to get the
test to trigger a failure on either Alpha or Sparc64 processors
(both running Linux). If there's not a confirmation that this
can still be triggered by the 1.95.6 release, I'll close this as
out-of-date.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-10-05 11:37
Message:
Logged In: YES
user_id=290026
About the portability of bit shifts:
If Expat used integer types with fixed sizes (e.g.those defined
in C99) instead of platform dependent ones, or if we defined
our own types to be always of the desired size regardless
of platform, should that not solve the problem?
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-07-10 13:59
Message:
Logged In: YES
user_id=3066
I just tried this on the Alpha system on the SourceForge
compile farm and the CVS version of Expat, and the
regression test I added doesn't trigger. Can you still
reproduce this with the CVS version of Expat?
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 15:45
Message:
Logged In: YES
user_id=3066
Good point. I've re-opened the report and noted the 64-bit
dependency in the summary.
----------------------------------------------------------------------
Comment By: Bent Jensen (bentjensen)
Date: 2002-04-19 15:14
Message:
Logged In: YES
user_id=392963
Hi again
I have tried all combinations of telleing the parset that i
want to use iso-8859-1 encoding - also to the
XML_ParserCreate function.
But you have to remark that i am running on a 64 bit
machine and in the routine where you are reading the input
chars you are doing bit shifts 'en masse' - and here
everything can goes wrong - bitshifts are not portable !
Sincerly
Bent Jensen, Senior consultant.
bent@kiya.dk
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 14:46
Message:
Logged In: YES
user_id=3066
I cannot reproduce this using CVS Expat. I've added a
regression test for this to be sure it doesn't crop up
(tests/runtests.c revision 1.7).
Make sure that you're passing either NULL or "iso-8859-1" to
the XML_ParserCreate*() function as the encoding name.
----------------------------------------------------------------------
Comment By: Bent Jensen (bentjensen)
Date: 2001-12-12 07:56
Message:
Logged In: YES
user_id=392963
Info: The expat package (version 1.95.2) was build on
alpha/axp OSF1 4.0D with gcc version 2.95.3. The test was
run on the same machine.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127
More information about the Expat-bugs
mailing list