[Expat-bugs] [ expat-Bugs-931546 ] Unixode support for Windows and Unix are not compatible

SourceForge.net noreply at sourceforge.net
Thu Apr 8 11:05:29 EDT 2004


Bugs item #931546, was opened at 2004-04-08 02:45
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=931546&group_id=10127

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Arthur Prosso (turik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unixode support for Windows and Unix are not compatible

Initial Comment:
This bug or "feauture" exists in 1.95.6 and also 1.95.7 
versions (I have not tested the previous ones).

There is inconsistency between the Unicode support on 
Unix and on Windows.

On windows wchar_t is defined as unsighed short. On 
most Unix platforms it is defined as unsigned int

If expat is compiled with XML_UNICODE_WCHAR_T 
switch, then XML_Char is defined as wchar_t, but ICHAR 
type is hardcoded as unsigned short for UNIX (see 
xmparse.c). As e result, every in XmlConvert() every 
two unsigned short characters are packed into one 
wchar_t character. Is it "BY DESIGN" ???

But the OnElementStart(XML_Char*), OnElementEnd
(XML_Char*) handlers assume a regular wchar_t 
character. 

As a result, the code written on Windows is not UNIX 
compatible.

The same problem occurs, when encoding ("UTF-16") is 
used. Before Calling XML_Parse on UNIX, the input buffer 
of type wchar_t* must be manually converted to 
unsigned short* to make things work


Please, explain this behaviour. Is it a "bug" or "by 
design" ?  

Thank you





 

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2004-04-08 11:05

Message:
Logged In: YES 
user_id=290026

If you want to use wchar_t in your application as an 
unsigned int, then you cannot use it for UTF-16 Unicode
anyway. So you are better off using UTF-8.
One can say: Expat does not support wchar_t unless
it is defined as a 16bit type. Which means you cannot
write platform independent code using wchar_t if
you want to run it on platforms that use unsigned int for it.

In that case, use unsigned short and leave
XML_UNICODE_WCHAR_T undefined.

This is not an Expat isse, but a platform issue with using 
wchar_t for UTF-16, which works on Windows, but not on
some Unixes.

I am curious, what would you want to change in Expat?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2004-04-08 10:44

Message:
Logged In: NO 

Waclav, than you for your answer

I think it IS A BUG, because 

1) you cannot write platform independent code this way, 
since wchar_t is a system type and the default is unsigned int 
on UNIX and unsighned short on Windows

2) wchar_t can't be used as unsigned short in a 
straightforward way, because if libc functions for wide 
character support are used in the code, every time one will 
need converting types from wchar_t to short before calling 
EXPAT API and back when getting the buffers from the 
callback functions.

3) If you always assume XML_CHAR is ALWAYS unsigned 
short, there is no need to have both XML_UNICODE and 
XML_UNICODE_WCHAR_T switches 

Will this issue be corrected in the  next releases?
Thank you.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-04-08 09:02

Message:
Logged In: YES 
user_id=290026

Expat assumes that wchar_t is defined as two-byte character
type. Otherwise UTF-16 output would not be possible.

That is why the README file includes the "-fshort_wchar"
compiler option in the build instructions for Unix.

If a Unix compiler does not support this option, then
one has to build Expat with XML_UNICODE defined, but
with XML_UNICODE_WCHAR_T undefined. This defines
XML_Char as unsigned short instead of wchar_t.

In both cases, there is not problem with ICHAR, and Expat
works the same on Windows and Unix.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=931546&group_id=10127



More information about the Expat-bugs mailing list