[Expat-bugs] [ expat-Bugs-491986 ] Charset decoding error (64-bit systems)

noreply@sourceforge.net noreply@sourceforge.net
Wed Nov 20 05:13:48 2002


Bugs item #491986, was opened at 2001-12-12 07:48
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127

Category: None
Group: Platform Specific
Status: Open
Resolution: Works For Me
Priority: 5
Submitted By: Bent Jensen (bentjensen)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Charset decoding error (64-bit systems)

Initial Comment:

When parsing xml with Danish letters (æøåÆØÅ) with 
eight bit set and declaring the encoding as <?xml 
version="1.0" encoding="iso-8859-1"?> (where the 
danish letters is placed as eight bit chars - the 
parser goes wrong.  If the input is:

  <person id="five.worker">
    <name><family>Worker</family> 
<given>Five</given></name>
    <email>J&oslash;rgen five@foo.com</email>
    <email>Jørgen five@foo.com</email>
    <link manager="Big.Boss"/>
  </person>


(Remark the danish letters in two forms)

The output is:

    START: email
CD: (null) - 'J' - 1
CD: (null) - 'rgen five@foo.com' - 17
END: email
CD: (null) - '
' - 1
CD: (null) - '    ' - 4
    START: email
CD: (null) - 'JÃ&#1567;rgen five@foo.com' - 20
END: email
CD: (null) - '
' - 1
CD: (null) - '    ' - 4

What am i doing wrong ?

If I embedd the string 'æøåÆØÅ' in the xml file - it 
goes all rigth ?!?!

I have modifyed the 'outline' example program for the 
above test.

Sincerly
Bent Jensen, Senior consultant.
bent@kiya.dk




----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-11-20 00:13

Message:
Logged In: YES 
user_id=3066

Using the SourceForge compile farm, I'm not able to get the 
test to trigger a failure on either Alpha or Sparc64 processors 
(both running Linux).  If there's not a confirmation that this 
can still be triggered by the 1.95.6 release, I'll close this as 
out-of-date.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-10-05 11:37

Message:
Logged In: YES 
user_id=290026

About the portability of bit shifts:

If Expat used integer types with fixed sizes (e.g.those defined 
in C99) instead of  platform dependent ones, or if we defined
our own types to be always of the desired size regardless
of platform, should that not solve the problem?


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-07-10 13:59

Message:
Logged In: YES 
user_id=3066

I just tried this on the Alpha system on the SourceForge
compile farm and the CVS version of Expat, and the
regression test I added doesn't trigger.  Can you still
reproduce this with the CVS version of Expat?

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 15:45

Message:
Logged In: YES 
user_id=3066

Good point.  I've re-opened the report and noted the 64-bit
dependency in the summary.

----------------------------------------------------------------------

Comment By: Bent Jensen (bentjensen)
Date: 2002-04-19 15:14

Message:
Logged In: YES 
user_id=392963

Hi again

I have tried all combinations of telleing the parset that i 
want to use iso-8859-1 encoding - also to the 
XML_ParserCreate function. 

But you have to remark that i am running on a 64 bit 
machine and in the routine where you are reading the input 
chars you are doing bit shifts 'en masse' - and here 
everything can goes wrong - bitshifts are not portable !

Sincerly 
Bent Jensen, Senior consultant. 
bent@kiya.dk 



----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 14:46

Message:
Logged In: YES 
user_id=3066

I cannot reproduce this using CVS Expat.  I've added a
regression test for this to be sure it doesn't crop up
(tests/runtests.c revision 1.7).

Make sure that you're passing either NULL or "iso-8859-1" to
the XML_ParserCreate*() function as the encoding name.

----------------------------------------------------------------------

Comment By: Bent Jensen (bentjensen)
Date: 2001-12-12 07:56

Message:
Logged In: YES 
user_id=392963

Info: The expat package (version 1.95.2) was build on 
alpha/axp OSF1 4.0D with gcc version 2.95.3. The test was 
run on the same machine.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127



More information about the Expat-bugs mailing list