[ expat-Bugs-491986 ] Charset decoding error (64-bit systems)

noreply@sourceforge.net noreply@sourceforge.net
Wed Jul 10 11:00:07 2002


Bugs item #491986, was opened at 2001-12-12 07:48
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127

Category: None
>Group: Platform Specific
Status: Open
Resolution: Works For Me
Priority: 5
Submitted By: Bent Jensen (bentjensen)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Charset decoding error (64-bit systems)

Initial Comment:

When parsing xml with Danish letters (æøåÆØÅ) with 
eight bit set and declaring the encoding as <?xml 
version="1.0" encoding="iso-8859-1"?> (where the 
danish letters is placed as eight bit chars - the 
parser goes wrong.  If the input is:

  <person id="five.worker">
    <name><family>Worker</family> 
<given>Five</given></name>
    <email>J&oslash;rgen five@foo.com</email>
    <email>Jørgen five@foo.com</email>
    <link manager="Big.Boss"/>
  </person>


(Remark the danish letters in two forms)

The output is:

    START: email
CD: (null) - 'J' - 1
CD: (null) - 'rgen five@foo.com' - 17
END: email
CD: (null) - '
' - 1
CD: (null) - '    ' - 4
    START: email
CD: (null) - 'JÃ&#1567;rgen five@foo.com' - 20
END: email
CD: (null) - '
' - 1
CD: (null) - '    ' - 4

What am i doing wrong ?

If I embedd the string 'æøåÆØÅ' in the xml file - it 
goes all rigth ?!?!

I have modifyed the 'outline' example program for the 
above test.

Sincerly
Bent Jensen, Senior consultant.
bent@kiya.dk




----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-07-10 13:59

Message:
Logged In: YES 
user_id=3066

I just tried this on the Alpha system on the SourceForge
compile farm and the CVS version of Expat, and the
regression test I added doesn't trigger.  Can you still
reproduce this with the CVS version of Expat?

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 15:45

Message:
Logged In: YES 
user_id=3066

Good point.  I've re-opened the report and noted the 64-bit
dependency in the summary.

----------------------------------------------------------------------

Comment By: Bent Jensen (bentjensen)
Date: 2002-04-19 15:14

Message:
Logged In: YES 
user_id=392963

Hi again

I have tried all combinations of telleing the parset that i 
want to use iso-8859-1 encoding - also to the 
XML_ParserCreate function. 

But you have to remark that i am running on a 64 bit 
machine and in the routine where you are reading the input 
chars you are doing bit shifts 'en masse' - and here 
everything can goes wrong - bitshifts are not portable !

Sincerly 
Bent Jensen, Senior consultant. 
bent@kiya.dk 



----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-04-19 14:46

Message:
Logged In: YES 
user_id=3066

I cannot reproduce this using CVS Expat.  I've added a
regression test for this to be sure it doesn't crop up
(tests/runtests.c revision 1.7).

Make sure that you're passing either NULL or "iso-8859-1" to
the XML_ParserCreate*() function as the encoding name.

----------------------------------------------------------------------

Comment By: Bent Jensen (bentjensen)
Date: 2001-12-12 07:56

Message:
Logged In: YES 
user_id=392963

Info: The expat package (version 1.95.2) was build on 
alpha/axp OSF1 4.0D with gcc version 2.95.3. The test was 
run on the same machine.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=491986&group_id=10127