[Expat-bugs] [ expat-Bugs-931546 ] Unixode support for Windows and
Unix are not compatible
SourceForge.net
noreply at sourceforge.net
Thu Apr 8 11:05:29 EDT 2004
Bugs item #931546, was opened at 2004-04-08 02:45
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=931546&group_id=10127
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Arthur Prosso (turik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unixode support for Windows and Unix are not compatible
Initial Comment:
This bug or "feauture" exists in 1.95.6 and also 1.95.7
versions (I have not tested the previous ones).
There is inconsistency between the Unicode support on
Unix and on Windows.
On windows wchar_t is defined as unsighed short. On
most Unix platforms it is defined as unsigned int
If expat is compiled with XML_UNICODE_WCHAR_T
switch, then XML_Char is defined as wchar_t, but ICHAR
type is hardcoded as unsigned short for UNIX (see
xmparse.c). As e result, every in XmlConvert() every
two unsigned short characters are packed into one
wchar_t character. Is it "BY DESIGN" ???
But the OnElementStart(XML_Char*), OnElementEnd
(XML_Char*) handlers assume a regular wchar_t
character.
As a result, the code written on Windows is not UNIX
compatible.
The same problem occurs, when encoding ("UTF-16") is
used. Before Calling XML_Parse on UNIX, the input buffer
of type wchar_t* must be manually converted to
unsigned short* to make things work
Please, explain this behaviour. Is it a "bug" or "by
design" ?
Thank you
----------------------------------------------------------------------
>Comment By: Karl Waclawek (kwaclaw)
Date: 2004-04-08 11:05
Message:
Logged In: YES
user_id=290026
If you want to use wchar_t in your application as an
unsigned int, then you cannot use it for UTF-16 Unicode
anyway. So you are better off using UTF-8.
One can say: Expat does not support wchar_t unless
it is defined as a 16bit type. Which means you cannot
write platform independent code using wchar_t if
you want to run it on platforms that use unsigned int for it.
In that case, use unsigned short and leave
XML_UNICODE_WCHAR_T undefined.
This is not an Expat isse, but a platform issue with using
wchar_t for UTF-16, which works on Windows, but not on
some Unixes.
I am curious, what would you want to change in Expat?
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2004-04-08 10:44
Message:
Logged In: NO
Waclav, than you for your answer
I think it IS A BUG, because
1) you cannot write platform independent code this way,
since wchar_t is a system type and the default is unsigned int
on UNIX and unsighned short on Windows
2) wchar_t can't be used as unsigned short in a
straightforward way, because if libc functions for wide
character support are used in the code, every time one will
need converting types from wchar_t to short before calling
EXPAT API and back when getting the buffers from the
callback functions.
3) If you always assume XML_CHAR is ALWAYS unsigned
short, there is no need to have both XML_UNICODE and
XML_UNICODE_WCHAR_T switches
Will this issue be corrected in the next releases?
Thank you.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2004-04-08 09:02
Message:
Logged In: YES
user_id=290026
Expat assumes that wchar_t is defined as two-byte character
type. Otherwise UTF-16 output would not be possible.
That is why the README file includes the "-fshort_wchar"
compiler option in the build instructions for Unix.
If a Unix compiler does not support this option, then
one has to build Expat with XML_UNICODE defined, but
with XML_UNICODE_WCHAR_T undefined. This defines
XML_Char as unsigned short instead of wchar_t.
In both cases, there is not problem with ICHAR, and Expat
works the same on Windows and Unix.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=931546&group_id=10127
More information about the Expat-bugs
mailing list